This is an archive of the discontinued LLVM Phabricator instance.

Swap loop invariant GEP with loop variant GEP to allow more LICM.
ClosedPublic

Authored by • hulx2000 on Jul 8 2015, 10:23 PM.

Download Raw Diff

Details

Reviewers

qcolombet
• zinob
atrick
• joker-eph-DISABLED
hfinkel
apazos

Summary

This patch changes the order of GEPs generated by Splitting GEPs
pass, specially when one of the GEPs has constant and the base is
loop invariant, then we will generate the GEP with constant first
when beneficial, to expose more cases for LICM.

If originally Splitting GEP generate the following:
  do.body.i:
    %idxprom.i = sext i32 %shr.i to i64
    %2 = bitcast %typeD* %s to i8*
    %3 = shl i64 %idxprom.i, 2
    %uglygep = getelementptr i8, i8* %2, i64 %3
    %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
  ...
Now it genereates:
  do.body.i:
    %idxprom.i = sext i32 %shr.i to i64
    %2 = bitcast %typeD* %s to i8*
    %3 = shl i64 %idxprom.i, 2
    %uglygep = getelementptr i8, i8* %2, i64 1032
    %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
  ...

For no-loop cases, the original way of generating GEPs seems to
expose more CSE cases, so we don't change the logic for no-loop
cases, and only limit our change to the specific case we are
interested in.

Diff Detail

Repository: rL LLVM

Event Timeline

• hulx2000 updated this revision to Diff 29301.Jul 8 2015, 10:23 PM

• hulx2000 retitled this revision from to Extend LICM to hoist loop invariant GEP out.

• hulx2000 updated this object.

• hulx2000 added reviewers: hfinkel, • joker-eph-DISABLED.

• hulx2000 set the repository for this revision to rL LLVM.

• hulx2000 added subscribers: apazos, mcrosier.

Herald added a subscriber: tberghammer. · View Herald TranscriptJul 8 2015, 10:23 PM

• hulx2000 added a subscriber: llvm-commits.Jul 8 2015, 10:45 PM

• hulx2000 added reviewers: mcrosier, • zinob, apazos.Jul 8 2015, 10:48 PM

mcrosier added reviewers: atrick, qcolombet.Jul 9 2015, 6:30 AM

Hi Lawrence,

I am not sure the transformation is legal, I’ll come back on that, but assuming it is, I think we could consider that moving the constant elements of a gep at the beginning of a gep-chain is the canonical representation.

If people agree on this, then this transformation would be better suited as an inst combine and we can get rid of all the is loop invariant checks.

Now, going back to the legality of the transformation, I am definitely not a language lawyer but it seems to me we may slightly modify the behavior of the program.
E.g., consider
C = gep A, B
D = gep C, 1023

Where gep C, 1023 produces an overflow.

After this transformation, we would get:
New = gep A, 1023
D = gep B, New

Now, the overflow may be on New = gep A, 1023, which potentially affects the behavior of the program.

Thoughts?

Cheers,
-Quentin

Hi, Quentin:

Thanks for your prompt response.

Just want to clarify, after the transformation, the new code would be:
C = gep A, 1023
D = gep C, B // Note that B is still the second operand.

Based on the testcase originated this, those GEPs were generated by Split GEP pass, that means originally there is only one GEP related to A, so overflow on the first or the second should not make too much difference, because they were all based on A originally , that's why I checked the type of A and C must be the same.

However whether the check is strict enough to make sure the transformation is legal, I am not 100% sure, definitely I'd like to hear more comments.

About doing it in inst combining, I am not an expert of inst combining, my question is: is it possible for inst combining to hoist instruction outside loop? The important aspect of this transformation is to enable LICM to hoist a loop invariant computation outside loop, which is impossible before since the first GEP is loop variant, and we have to check if the first GEP is loop variant (or don't know), because this transformation is not needed otherwise -- If LICM can hoist the first GEP out, then it will hoist the second GEP out too.

More comments are welcome.

Regards

Lawrence Hu

I believe that if you restrict to 'inbounds' GEPs, you can avoid the overflow question.

About doing it in inst combining, I am not an expert of inst combining, my question is: is it possible for inst combining to hoist instruction outside loop?

No, but Quentin is right. When this kind of interchange is legal, we should establish a canonical form and use instcombine to transform to it. This will also allow CSE/GVN, etc. to pick up more of these cases.

Please upload your patches with full context.

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped
(2) Unlike Reassociate, InstCombine has no knowledge of the CFG

I think this should be part of SeparateConstOffsetFromGEP, which is also optimizing for register pressure and addressing modes. As part of this pass it is target configurable and controlled by EnableGEPOpt.

In D11051#204369, @atrick wrote:

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped

Can you even do this if you don't have inbounds GEPs? Nuno, do you have an opinion about this?

(2) Unlike Reassociate, InstCombine has no knowledge of the CFG

I think this should be part of SeparateConstOffsetFromGEP, which is also optimizing for register pressure and addressing modes. As part of this pass it is target configurable and controlled by EnableGEPOpt.

Thanks Hal & Andrew for your valuable comments.

I uploaded the full content diff.

Plus:

This patches originally is developed as part of Splitting GEP, later I moved it into LICM because I can reuse loopinfo in LICM and it save an extra loop to collect information (won't increase compile time much, but should avoid if possible anyway), and avoid some mess check to avoid performance regression. It completely ok to move it back to Splitting GEP if needed, as long as very one agrees.

For the pattern originated this, inbound is set to false by Splitting GEP.

If we want to do it in InstCombine, then we must make sure there is one pass of Splitting GEP, InstCombine, LICM happen in sequence, I will check that.

I will not change the design of this patch for now, will do it after most of the people reach an agreement.

Regards

Lawrence Hu

In D11051#204371, @hfinkel wrote:

In D11051#204369, @atrick wrote:

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped

Can you even do this if you don't have inbounds GEPs? Nuno, do you have an opinion about this?

I think Andy is right.
If you don't have the inbounds tag, then you can safely perform this transformation, since LLVM's semantics guarantee that it will compute the overflowing value correctly. The implementation of the lowering and constant folding of GEPs in LLVM matches the semantics.
If you have the inbounds tag, then the transformation can still be performed, although inbounds has to dropped in most cases, since otherwise we would need to prove that the intermediate results don't overflow.

This problem is similar to reassociation of integer expressions. It's always safe when there are no nsw/nuw, but if there are, they usually have to be dropped.

Nuno

Hi, Guys:

Can we reach an agreement about where to do it? I posted my older patch which is part of Split GEP at http://reviews.llvm.org/D11443.

Regards

You need to address this:

If you have the inbounds tag, then the transformation can still be performed, although inbounds has to dropped in most cases, since otherwise we would need to prove that the intermediate results don't overflow.

By either restricting to GEPs that don't have inbounds, or dropping inbounds after performing the transformation. Specifically, I think what Nuno is saying is that, for example, if we have:

ptr = (p + o) + c

and both are inbounds, then we know that (p + o) does not overflow, and specifically is no more than one byte past the end of the object to which p belongs. We also know that (p + o) + c offers the same guarantees. We don't, however, know that (p + c) is necessarily in bounds. Imagine that (p + o) is one past the end of the object, and c is -1. Then p + c might point before the beginning of the object.

We should not give up too easily, however, because is p is some known object we might be able to easily determine whether (p + c) is still inbounds. One way to do this would be to call:

V->stripAndAccumulateInBoundsConstantOffsets

on the base pointer and then see if we determine the size of the object. You can call getObjectSize(...) to attempt to figure this out (include/llvm/Analysis/MemoryBuiltins.h).

lib/Transforms/Scalar/LICM.cpp
416	This seems a bit odd. We should not be looking only at the very next instruction. At the very least, check for users in the same block.

Thanks Hal.

For you comments about stripAndAccumulateInBoundsConstantOffsets and getObjectSize.., I didn't understand it 100%, please let me know if that is what you are thinking.

Regards

Herald added subscribers: srhines, danalbert. · View Herald TranscriptJul 25 2015, 4:26 PM

In D11051#212087, @hulx2000 wrote:

Thanks Hal.

For you comments about stripAndAccumulateInBoundsConstantOffsets and getObjectSize.., I didn't understand it 100%, please let me know if that is what you are thinking.

I mean that you can call stripAndAccumulateInBoundsConstantOffsets on the first GEP you form (the one with the constant index), which will give you a new base and an offset. You can then call getObjectSize on that new base, and that might give you the object size. If the Offset <= ObjectSize, then you can mark the first GEP as inbounds.

Regards

Some comments about the inbounds changes. They also need to be covered by the regression tests.

lib/Transforms/Scalar/LICM.cpp
1145	I think this can be uge, because inbounds pointers are allowed to point to one byte past the end of the object (and thus, equal the size). Actually, you want Offset.ule (the offset must be less than or equal to the size for it to be inbounds).
1147	But you also need to set inbounds to false is the offset is larger than the size (even if the original GEP was inbounds). You need to set the inbounds on the second GEP to false.

• hulx2000 added inline comments.Jul 28 2015, 10:11 AM

lib/Transforms/Scalar/LICM.cpp
1147	Thanks Hal. The second GEP should be safe to have whatever inbound attribute as before, since originally the first is p+o, the second is p+o+c, now the first is p+c, the second is p+c+o==p+o+c.

atrick added inline comments.Jul 28 2015, 10:26 AM

lib/Transforms/Scalar/LICM.cpp
1147	I don't follow this comment. My reading of langref is that both the GEP's base and result must be inbounds. So if the first GEP is not inbounds and is the base of the second GEP, then the second GEP cannot be inbounds. (This is why I generally don't like this transformation.)

hfinkel added inline comments.Jul 28 2015, 10:45 AM

lib/Transforms/Scalar/LICM.cpp
1147	Andy is correct. The LangRef is clear on this matter, saying: "If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses..." So you can't have inbounds on the second GEP either.

If you need to do this transformation is cases where the inbounds flag will be dropped it seems best to do it late in the pipeline (maybe even after LSR) as a target-controlled optimization. Arranging for the hoisting to happen seems like the easy part of this problem.

Out of curiosity, I wonder if the LSR fix here http://reviews.llvm.org/D11212 has any impact on this test case.

In D11051#213677, @atrick wrote:

If you need to do this transformation is cases where the inbounds flag will be dropped it seems best to do it late in the pipeline (maybe even after LSR) as a target-controlled optimization. Arranging for the hoisting to happen seems like the easy part of this problem.

I agree. Also, as it turns out, both NVPTX and PowerPC run LICM late in the pipeline, and AArch64 can as well when -aarch64-gep-opt is enabled (PowerPC also runs it under the control of -ppc-gep-opt, but unlike AArch64, it is currently on by default).

Out of curiosity, I wonder if the LSR fix here http://reviews.llvm.org/D11212 has any impact on this test case.

Oh yeah. I forgot about my first comment on this patch. I'll repeat it:

Hi, Andrew:

About your comments, I have asked to reach an agreement, and posted my earlier version of this patch (in SeparateConstOffsetFromGEP) to http://reviews.llvm.org/D11443 to see if you guys like it or not. 

Since you and Hal both agree it should be done somewhere else, I assume SeparateConstOffsetFromGEP is a good place to go, I will move the code back to SeparateConstOffsetFromGEP then, hopefully I can do better then http://reviews.llvm.org/D11443.

Regards

Lawrence Hu

Forgot to mention.

This transformation is useful, it helps one of our benchmark by 7%, and that is the main reason why llvm is behind gcc for this benchmarks.

• hulx2000 updated this revision to Diff 31193.Aug 1 2015, 2:34 PM

moved the code back to Splitting GEP pass.

mcrosier resigned from this revision.Aug 5 2015, 8:15 AM

mcrosier removed a reviewer: mcrosier.

I don't have any serious objections.

NumOfCommonBaseGEPs==1 seems very artificial. Can you generalize this to NumOfCommonBaseGEPs > 0?

Quentin may want to do an arm64 benchmark run before we ok this. I'll leave that up to him.

I am making a quick assembly diff of the test suite to have a sense of the impact.

Hi Lawrence,

There are no diffs on the test-suite, so you’re good on that side :).

I don’t have any problems with the patch and let you address Andy’s comment.

Thanks,
-Quentin

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1092 ↗	(On Diff #31194)	hasOneUse

Thanks Andrew and Quentin.

Quentin, the whole Splitting GEP pass is turned off, could you please try -mllvm -aarch64-gep-opt=true or revert 89cc8dd3b84a636b2798028a994f4b71c78e0163, internally, I have to do that to see difference, of course, it depends on testcase.

I will address Andrew's comment in next few days, stuck on bugs now.

In D11051#218985, @hulx2000 wrote:

Quentin, the whole Splitting GEP pass is turned off, could you please try -mllvm -aarch64-gep-opt=true or revert 89cc8dd3b84a636b2798028a994f4b71c78e0163, internally, I have to do that to see difference, of course, it depends on testcase.

Lawrence,
Just a friendly reminder that the community prefers a svn revision number over a git hash any day.. :)

Chad

I didn't realize the pass was off. We don't need to benchmark the change in that case, although whoever is using the pass might want to ?!

Haha, I missed that too!

+1 for Andy’s comment.

Thanks, Chad:

commit 89cc8dd3b84a636b2798028a994f4b71c78e0163
Author: James Molloy <james.molloy@arm.com>
Date: Wed Apr 22 09:11:38 2015 +0000

[AArch64] Disable complex GEP optimization by default.

Enough concerns were raised that this optimization is pessimising some code patterns.

The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235491 91177308-0d34-0410-b5e6-96231b3b80d8

Lawrence, Chad, does this pass help you independent from the proposed patch? Is it something you will work toward enabling by default. If so, great. Otherwise, you probably want to decide which part will be controlled by flag and which part you want to enable by default. Obviously, changes to the default should be benchmarked.

Hi, Andy

I have seen up and down with Splitting GEP pass.

Yes, I am considering fixing a few more issues, then enable it by default, of course, we will do benchmark measurement at that time, can I ask your help at then to measure your benchmark?

Regards.

FWIW, I do not see any differences with -Os -mllvm -aarch64-gep-opt=true either.

I don't regularly benchmark llvm test-suite any more, but I can help find someone who does.

In D11051#219017, @atrick wrote:

Lawrence, Chad, does this pass help you independent from the proposed patch?

I believe we saw a 3-4% regression on Spec200x/mcf with James disabled the pass. This was on our internal branch in the context of LTO, which departs greatly from the community LTO flow (we're hoping to fix that soon).

Is it something you will work toward enabling by default. If so, great.

@hulx2000: Let me know if you'd like to discuss this offline. Have you or Ana discussed your plans with James?

Thanks, Quentin.

Interesting though, James said he turned off that Pass because Apple seeing non-negligible slowdowns, he said ok to turn it back on with tweaks, but "you’ll have to provide evidence to convince Gerolf".

Hmm, may be with LTO then.

I haven’t tried that.

hfinkel added inline comments.Aug 10 2015, 12:05 AM

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
758 ↗	(On Diff #31194)	You shouldn't need to pre-compute this, you can examine the use list if necessary, something like this: auto TooManyUsesInLoop = [](Value V, Loop L) { int UsesInLoop = 0; for (User U : V->users()) { if (Instruction User = dyn_cast<Instruction>(U)) if (L->contains(User)) if (++UsesInLoop > 1) return true; return false; } };
803 ↗	(On Diff #31194)	Then -> then could move -> can move
1060 ↗	(On Diff #31194)	This seems unnecessary (see above).

• hulx2000 updated this revision to Diff 32097.Aug 13 2015, 2:50 PM

ping

LGTM.

This revision is now accepted and ready to land.Sep 23 2015, 10:31 AM

merged as git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248420 91177308-0d34-0410-b5e6-96231b3b80d8

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

2 lines

lib/

Transforms/

Scalar/

LICM.cpp

138 lines

test/

Transforms/

LICM/

hoist-gep.ll

49 lines

Diff 30834

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	/// dominated by the specified block, and that are in the current loop) in depth			/// dominated by the specified block, and that are in the current loop) in depth
	/// first order w.r.t the DominatorTree. This allows us to visit definitions			/// first order w.r.t the DominatorTree. This allows us to visit definitions
	/// before uses, allowing us to hoist a loop body in one pass without iteration.			/// before uses, allowing us to hoist a loop body in one pass without iteration.
	/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, DataLayout,			/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, DataLayout,
	/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the			/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the
	/// loop and loop safety information as arguments. It returns changed status.			/// loop and loop safety information as arguments. It returns changed status.
	bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,			bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,
	TargetLibraryInfo , Loop , AliasSetTracker *,			TargetLibraryInfo , Loop , AliasSetTracker *,
	LICMSafetyInfo *);			LICMSafetyInfo , DenseMap<Value , long int> &);

	/// \brief Try to promote memory values to scalars by sinking stores out of			/// \brief Try to promote memory values to scalars by sinking stores out of
	/// the loop and moving loads to before the loop. We do this by looping over			/// the loop and moving loads to before the loop. We do this by looping over
	/// the stores in the loop, looking for stores to Must pointers which are			/// the stores in the loop, looking for stores to Must pointers which are
	/// loop invariant. It takes AliasSet, Loop exit blocks vector, loop exit blocks			/// loop invariant. It takes AliasSet, Loop exit blocks vector, loop exit blocks
	/// insertion point vector, PredIteratorCache, LoopInfo, DominatorTree, Loop,			/// insertion point vector, PredIteratorCache, LoopInfo, DominatorTree, Loop,
	/// AliasSet information for all instructions of the loop and loop safety			/// AliasSet information for all instructions of the loop and loop safety
	/// information as arguments. It returns changed status.			/// information as arguments. It returns changed status.
	Show All 19 Lines

lib/Transforms/Scalar/LICM.cpp

Show All 31 Lines

#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AliasSetTracker.h"		#include "llvm/Analysis/AliasSetTracker.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	static Instruction *CloneInstructionInExitBlock(const Instruction &I,
BasicBlock &ExitBlock,		BasicBlock &ExitBlock,
PHINode &PN,		PHINode &PN,
const LoopInfo *LI);		const LoopInfo *LI);
static bool canSinkOrHoistInst(Instruction &I, AliasAnalysis *AA,		static bool canSinkOrHoistInst(Instruction &I, AliasAnalysis *AA,
DominatorTree DT, TargetLibraryInfo TLI,		DominatorTree DT, TargetLibraryInfo TLI,
Loop CurLoop, AliasSetTracker CurAST,		Loop CurLoop, AliasSetTracker CurAST,
LICMSafetyInfo *SafetyInfo);		LICMSafetyInfo *SafetyInfo);

		/// \brief Check if operands of two GEP can be swapped and then one of them
		// can be hoisted.
		static bool
		canSwapOperandAndHoistGEPs(Instruction First, Loop CurLoop,
		DenseMap<Value *, int64_t> &NumOfGepBaseUses);
		static void swapGEPOperands(Instruction First, Instruction Second,
		const TargetLibraryInfo *TLI);

namespace {		namespace {
struct LICM : public LoopPass {		struct LICM : public LoopPass {
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
LICM() : LoopPass(ID) {		LICM() : LoopPass(ID) {
initializeLICMPass(*PassRegistry::getPassRegistry());		initializeLICMPass(*PassRegistry::getPassRegistry());
}		}

bool runOnLoop(Loop *L, LPPassManager &LPM) override;		bool runOnLoop(Loop *L, LPPassManager &LPM) override;
Show All 30 Lines	private:
TargetLibraryInfo *TLI; // TargetLibraryInfo for constant folding.		TargetLibraryInfo *TLI; // TargetLibraryInfo for constant folding.

// State that is updated as we process loops.		// State that is updated as we process loops.
bool Changed; // Set to true when we change anything.		bool Changed; // Set to true when we change anything.
BasicBlock *Preheader; // The preheader block of the current loop...		BasicBlock *Preheader; // The preheader block of the current loop...
Loop *CurLoop; // The current loop we are working on...		Loop *CurLoop; // The current loop we are working on...
AliasSetTracker *CurAST; // AliasSet information for the current loop...		AliasSetTracker *CurAST; // AliasSet information for the current loop...
DenseMap<Loop, AliasSetTracker> LoopToAliasSetMap;		DenseMap<Loop, AliasSetTracker> LoopToAliasSetMap;
		// Number of times a GEP base used inside a loop.
		DenseMap<Value *, int64_t> NumOfGepBaseUses;

/// cloneBasicBlockAnalysis - Simple Analysis hook. Clone alias set info.		/// cloneBasicBlockAnalysis - Simple Analysis hook. Clone alias set info.
void cloneBasicBlockAnalysis(BasicBlock From, BasicBlock To,		void cloneBasicBlockAnalysis(BasicBlock From, BasicBlock To,
Loop *L) override;		Loop *L) override;

/// deleteAnalysisValue - Simple Analysis hook. Delete value V from alias		/// deleteAnalysisValue - Simple Analysis hook. Delete value V from alias
/// set.		/// set.
void deleteAnalysisValue(Value V, Loop L) override;		void deleteAnalysisValue(Value V, Loop L) override;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (Loop::iterator LoopItr = L->begin(), LoopItrE = L->end();
LoopToAliasSetMap.erase(InnerL);		LoopToAliasSetMap.erase(InnerL);
}		}

CurLoop = L;		CurLoop = L;

// Get the preheader block to move instructions into...		// Get the preheader block to move instructions into...
Preheader = L->getLoopPreheader();		Preheader = L->getLoopPreheader();

		NumOfGepBaseUses.clear();
// Loop over the body of this loop, looking for calls, invokes, and stores.		// Loop over the body of this loop, looking for calls, invokes, and stores.
// Because subloops have already been incorporated into AST, we skip blocks in		// Because subloops have already been incorporated into AST, we skip blocks in
// subloops.		// subloops.
//		//
for (Loop::block_iterator I = L->block_begin(), E = L->block_end();		for (Loop::block_iterator I = L->block_begin(), E = L->block_end();
I != E; ++I) {		I != E; ++I) {
BasicBlock BB = I;		BasicBlock BB = I;
if (LI->getLoopFor(BB) == L) // Ignore blocks in subloops.		if (LI->getLoopFor(BB) == L) // Ignore blocks in subloops.
CurAST->add(*BB); // Incorporate the specified basic block		CurAST->add(*BB); // Incorporate the specified basic block

		// Record number of times the base of a GEP is used inside a loop.
		for (BasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE; ++II)
		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(II))
		NumOfGepBaseUses[GEP->getOperand(0)]++;
}		}

// Compute loop safety information.		// Compute loop safety information.
LICMSafetyInfo SafetyInfo;		LICMSafetyInfo SafetyInfo;
computeLICMSafetyInfo(&SafetyInfo, CurLoop);		computeLICMSafetyInfo(&SafetyInfo, CurLoop);

// We want to visit all of the instructions in this loop... that are not parts		// We want to visit all of the instructions in this loop... that are not parts
// of our subloops (they have already had their invariants hoisted out of		// of our subloops (they have already had their invariants hoisted out of
// their loop, into this loop, so there is no need to process the BODIES of		// their loop, into this loop, so there is no need to process the BODIES of
// the subloops).		// the subloops).
//		//
// Traverse the body of the loop in depth first order on the dominator tree so		// Traverse the body of the loop in depth first order on the dominator tree so
// that we are guaranteed to see definitions before we see uses. This allows		// that we are guaranteed to see definitions before we see uses. This allows
// us to sink instructions in one pass, without iteration. After sinking		// us to sink instructions in one pass, without iteration. After sinking
// instructions, we perform another pass to hoist them out of the loop.		// instructions, we perform another pass to hoist them out of the loop.
//		//
if (L->hasDedicatedExits())		if (L->hasDedicatedExits())
Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, CurLoop,		Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, CurLoop,
CurAST, &SafetyInfo);		CurAST, &SafetyInfo);
if (Preheader)		if (Preheader)
Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI,		Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI,
CurLoop, CurAST, &SafetyInfo);		CurLoop, CurAST, &SafetyInfo, NumOfGepBaseUses);

// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
if (!DisablePromotion && (Preheader \|\| L->hasDedicatedExits())) {		if (!DisablePromotion && (Preheader \|\| L->hasDedicatedExits())) {
SmallVector<BasicBlock *, 8> ExitBlocks;		SmallVector<BasicBlock *, 8> ExitBlocks;
SmallVector<Instruction *, 8> InsertPts;		SmallVector<Instruction *, 8> InsertPts;
PredIteratorCache PIC;		PredIteratorCache PIC;

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first		/// the specified block, and that are in the current loop) in depth first
/// order w.r.t the DominatorTree. This allows us to visit definitions before		/// order w.r.t the DominatorTree. This allows us to visit definitions before
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,		DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,
AliasSetTracker CurAST, LICMSafetyInfo SafetyInfo) {		AliasSetTracker CurAST, LICMSafetyInfo SafetyInfo,
		DenseMap<Value *, int64_t> &NumOfGepBaseUses) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
DT != nullptr && CurLoop != nullptr && CurAST != nullptr &&		CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr &&
SafetyInfo != nullptr && "Unexpected input to hoistRegion");		"Unexpected input to hoistRegion");
// Set changed as false.		// Set changed as false.
bool Changed = false;		bool Changed = false;
// Get basic block		// Get basic block
BasicBlock *BB = N->getBlock();		BasicBlock *BB = N->getBlock();
// If this subregion is not in the top level loop at all, exit.		// If this subregion is not in the top level loop at all, exit.
if (!CurLoop->contains(BB)) return Changed;		if (!CurLoop->contains(BB)) return Changed;
// Only need to process the contents of this block if it is not part of a		// Only need to process the contents of this block if it is not part of a
// subloop (which would already have been processed).		// subloop (which would already have been processed).
if (!inSubLoop(BB, CurLoop, LI))		if (!inSubLoop(BB, CurLoop, LI))
for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ) {		for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ) {
Instruction &I = *II++;		Instruction &I = *II++;
		Instruction *INext = II;

// Try constant folding this instruction. If all the operands are		// Try constant folding this instruction. If all the operands are
// constants, it is technically hoistable, but it would be better to just		// constants, it is technically hoistable, but it would be better to just
// fold it.		// fold it.
if (Constant *C = ConstantFoldInstruction(		if (Constant *C = ConstantFoldInstruction(
&I, I.getModule()->getDataLayout(), TLI)) {		&I, I.getModule()->getDataLayout(), TLI)) {
DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C << '\n');		DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C << '\n');
CurAST->copyValue(&I, C);		CurAST->copyValue(&I, C);
CurAST->deleteValue(&I);		CurAST->deleteValue(&I);
I.replaceAllUsesWith(C);		I.replaceAllUsesWith(C);
I.eraseFromParent();		I.eraseFromParent();
continue;		continue;
}		}

// Try hoisting the instruction out to the preheader. We can only do this		// Try hoisting the instruction out to the preheader. We can only do this
// if all of the operands of the instruction are loop invariant and if it		// if all of the operands of the instruction are loop invariant and if it
// is safe to hoist the instruction.		// is safe to hoist the instruction.
//		//
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, TLI, CurLoop, CurAST, SafetyInfo) &&		canSinkOrHoistInst(I, AA, DT, TLI, CurLoop, CurAST, SafetyInfo) &&
isSafeToExecuteUnconditionally(I, DT, TLI, CurLoop, SafetyInfo,		isSafeToExecuteUnconditionally(I, DT, TLI, CurLoop, SafetyInfo,
CurLoop->getLoopPreheader()->getTerminator()))		CurLoop->getLoopPreheader()->getTerminator()))
Changed \|= hoist(I, CurLoop->getLoopPreheader());		Changed \|= hoist(I, CurLoop->getLoopPreheader());

		if (canSwapOperandAndHoistGEPs(&I, CurLoop, NumOfGepBaseUses)) {
		hfinkelUnsubmitted Done Reply Inline Actions This seems a bit odd. We should not be looking only at the very next instruction. At the very least, check for users in the same block. hfinkel: This seems a bit odd. We should not be looking only at the very next instruction. At the very…
		swapGEPOperands(&I, INext, TLI);
		hoist(I, CurLoop->getLoopPreheader());
		}
}		}

const std::vector<DomTreeNode*> &Children = N->getChildren();		const std::vector<DomTreeNode*> &Children = N->getChildren();
for (unsigned i = 0, e = Children.size(); i != e; ++i)		for (unsigned i = 0, e = Children.size(); i != e; ++i)
Changed \|=		Changed \|= hoistRegion(Children[i], AA, LI, DT, TLI, CurLoop, CurAST,
hoistRegion(Children[i], AA, LI, DT, TLI, CurLoop, CurAST, SafetyInfo);		SafetyInfo, NumOfGepBaseUses);
return Changed;		return Changed;
}		}

/// Computes loop safety information, checks loop body & header		/// Computes loop safety information, checks loop body & header
/// for the possiblity of may throw exception.		/// for the possiblity of may throw exception.
///		///
void llvm::computeLICMSafetyInfo(LICMSafetyInfo * SafetyInfo, Loop * CurLoop) {		void llvm::computeLICMSafetyInfo(LICMSafetyInfo * SafetyInfo, Loop * CurLoop) {
assert(CurLoop != nullptr && "CurLoop cant be null");		assert(CurLoop != nullptr && "CurLoop cant be null");
▲ Show 20 Lines • Show All 609 Lines • ▼ Show 20 Lines
/// Little predicate that returns true if the specified basic block is in		/// Little predicate that returns true if the specified basic block is in
/// a subloop of the current one, not the current one itself.		/// a subloop of the current one, not the current one itself.
///		///
static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI) {		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI) {
assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop");		assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop");
return LI->getLoopFor(BB) != CurLoop;		return LI->getLoopFor(BB) != CurLoop;
}		}

		static bool
		canSwapOperandAndHoistGEPs(Instruction First, Loop CurLoop,
		DenseMap<Value *, int64_t> &NumOfGepBaseUses) {
		if (!First \|\| First->getNumUses() != 1)
		return false;

		Instruction Second = cast<Instruction>(First->user_begin());
		if (!Second \|\| First->getParent() != Second->getParent())
		return false;

		GetElementPtrInst *FirstGEP = dyn_cast<GetElementPtrInst>(First);
		GetElementPtrInst *SecondGEP = dyn_cast<GetElementPtrInst>(Second);
		if (!FirstGEP \|\| !SecondGEP)
		return false;

		unsigned FirstNum = FirstGEP->getNumOperands();
		unsigned SecondNum = SecondGEP->getNumOperands();
		// Give up if the number of operands are not 2.
		if (FirstNum != SecondNum \|\| SecondNum != 2)
		return false;

		Value *FirstBase = FirstGEP->getOperand(0);
		Value *SecondBase = SecondGEP->getOperand(0);
		Value *FirstOffset = FirstGEP->getOperand(1);
		Value *SecondOffset = SecondGEP->getOperand(1);
		// Give up if the first base is not loop invariant or used more than once.
		if (!CurLoop->isLoopInvariant(FirstBase) \|\| NumOfGepBaseUses[FirstBase] != 1)
		return false;

		// Give up if the second operand of the first GEP is loop invariant.
		if (CurLoop->isLoopInvariant(FirstOffset))
		return false;

		// Give up if the second operand of second GEP is not constant.
		if (!dyn_cast<ConstantInt>(SecondOffset))
		return false;

		// Give up if base doesn't have same type.
		if (FirstBase->getType() != SecondBase->getType())
		return false;

		// Give up if second is not the only user of First.
		if (!First->hasOneUse() \|\|
		dyn_cast<Instruction>(*First->user_begin()) != Second \|\|
		dyn_cast<Instruction>(SecondBase) != First)
		return false;

		Instruction *FirstOffsetDef = dyn_cast<Instruction>(FirstOffset);

		// Check if the second operand of first GEP has constant coefficient.
		// For an example, for the following code, we won't gain anything by
		// hoisting the second GEP out because the second GEP can be folded away.
		// %scevgep.sum.ur159 = add i64 %idxprom48.ur, 256
		// %67 = shl i64 %scevgep.sum.ur159, 2
		// %uglygep160 = getelementptr i8* %65, i64 %67
		// %uglygep161 = getelementptr i8* %uglygep160, i64 -1024

		// Skip constant shift instruction which may be generated by Splitting GEPs.
		if (FirstOffsetDef && FirstOffsetDef->isShift() &&
		dyn_cast<ConstantInt>(FirstOffsetDef->getOperand(1)))
		FirstOffsetDef = dyn_cast<Instruction>(FirstOffsetDef->getOperand(0));

		// Give up if FirstOffsetDef is an Add or Sub with constant.
		// Because it may not profitable at all due to constant folding.
		if (FirstOffsetDef)
		if (BinaryOperator *BO = dyn_cast<BinaryOperator>(FirstOffsetDef)) {
		unsigned opc = BO->getOpcode();
		if ((opc == Instruction::Add \|\| opc == Instruction::Sub) &&
		(dyn_cast<ConstantInt>(BO->getOperand(0)) \|\|
		dyn_cast<ConstantInt>(BO->getOperand(1))))
		return false;
		}
		return true;
		}

		static void swapGEPOperands(Instruction First, Instruction Second,
		const TargetLibraryInfo *TLI) {
		Value *Offset1 = First->getOperand(1);
		Value *Offset2 = Second->getOperand(1);
		First->setOperand(1, Offset2);
		Second->setOperand(1, Offset1);

		// We changed p+o+c to p+c+o, p+c may not be inbound anymore.
		// However the inbound attribute of the p+c+o should be the same.
		GetElementPtrInst *FirstGEP = dyn_cast<GetElementPtrInst>(First);
		const DataLayout &DAL = First->getModule()->getDataLayout();
		APInt Offset(DAL.getPointerSizeInBits(
		cast<PointerType>(First->getType())->getAddressSpace()),
		0);
		Value *NewBase =
		First->stripAndAccumulateInBoundsConstantOffsets(DAL, Offset);
		uint64_t ObjectSize;
		if (!getObjectSize(NewBase, ObjectSize, DAL, TLI))
		FirstGEP->setIsInBounds(false);
		else if (Offset.ule(ObjectSize))
		hfinkelUnsubmitted Done Reply Inline Actions I think this can be uge, because inbounds pointers are allowed to point to one byte past the end of the object (and thus, equal the size). Actually, you want Offset.ule (the offset must be less than or equal to the size for it to be inbounds). hfinkel: I think this can be uge, because inbounds pointers are allowed to point to one byte past the…
		FirstGEP->setIsInBounds(true);
		else
		hfinkelUnsubmitted Done Reply Inline Actions But you also need to set inbounds to false is the offset is larger than the size (even if the original GEP was inbounds). You need to set the inbounds on the second GEP to false. hfinkel: But you also need to set inbounds to false is the offset is larger than the size (even if the…
		hulx2000AuthorUnsubmitted Not Done Reply Inline Actions Thanks Hal. The second GEP should be safe to have whatever inbound attribute as before, since originally the first is p+o, the second is p+o+c, now the first is p+c, the second is p+c+o==p+o+c. hulx2000: Thanks Hal. The second GEP should be safe to have whatever inbound attribute as before, since…
		atrickUnsubmitted Not Done Reply Inline Actions I don't follow this comment. My reading of langref is that both the GEP's base and result must be inbounds. So if the first GEP is not inbounds and is the base of the second GEP, then the second GEP cannot be inbounds. (This is why I generally don't like this transformation.) atrick: I don't follow this comment. My reading of langref is that both the GEP's base and result must…
		hfinkelUnsubmitted Not Done Reply Inline Actions Andy is correct. The LangRef is clear on this matter, saying: "If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses..." So you can't have inbounds on the second GEP either. hfinkel: Andy is correct. The LangRef is clear on this matter, saying: "If the inbounds keyword is…
		FirstGEP->setIsInBounds(false);
		}

test/Transforms/LICM/hoist-gep.ll

This file was added.

				; RUN: opt < %s -licm -S \| FileCheck %s
				target triple = "aarch64--linux-android"

				%typeD = type { i32, i32, [256 x i32], [257 x i32] }

				; Function Attrs: noreturn nounwind uwtable
				define i32 @test1(%typeD* nocapture %s) {
				entry:
				; CHECK-LABEL: entry:
				; CHECK: %uglygep = getelementptr i8, i8* %0, i64 1032
				; CHECK-NEXT: br label %do.body.i
				%tPos = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 0
				%k0 = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 1
				%.pre = load i32, i32* %tPos, align 4
				br label %do.body.i

				do.body.i:
				; CHECK-LABEL: do.body.i:
				; CHECK: %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
				; CHECK-NOT: %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
				; CHECK: br i1 %cmp1.i, label %X__indexIntoF.exit, label %do.body.i.backedge
				%0 = phi i32 [ 256, %entry ], [ %.be, %do.body.i.backedge ]
				%1 = phi i32 [ 0, %entry ], [ %.be6, %do.body.i.backedge ]
				%add.i = add nsw i32 %1, %0
				%shr.i = ashr i32 %add.i, 1
				%idxprom.i = sext i32 %shr.i to i64
				%2 = bitcast %typeD* %s to i8*
				%3 = shl i64 %idxprom.i, 2
				%uglygep = getelementptr i8, i8* %2, i64 %3
				%uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
				%4 = bitcast i8* %uglygep7 to i32*
				%5 = load i32, i32* %4, align 4
				%cmp.i = icmp sle i32 %5, %.pre
				%na.1.i = select i1 %cmp.i, i32 %0, i32 %shr.i
				%nb.1.i = select i1 %cmp.i, i32 %shr.i, i32 %1
				%sub.i = sub nsw i32 %na.1.i, %nb.1.i
				%cmp1.i = icmp eq i32 %sub.i, 1
				br i1 %cmp1.i, label %X__indexIntoF.exit, label %do.body.i.backedge

				do.body.i.backedge: ; preds = %do.body.i, %X__indexIntoF.exit
				%.be = phi i32 [ %na.1.i, %do.body.i ], [ 256, %X__indexIntoF.exit ]
				%.be6 = phi i32 [ %nb.1.i, %do.body.i ], [ 0, %X__indexIntoF.exit ]
				br label %do.body.i

				X__indexIntoF.exit: ; preds = %do.body.i
				store i32 %nb.1.i, i32* %k0, align 4
				br label %do.body.i.backedge
				}

This is an archive of the discontinued LLVM Phabricator instance.

Swap loop invariant GEP with loop variant GEP to allow more LICM.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 30834

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Scalar/LICM.cpp

test/Transforms/LICM/hoist-gep.ll

Swap loop invariant GEP with loop variant GEP to allow more LICM.
ClosedPublic