This is an archive of the discontinued LLVM Phabricator instance.

Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates
ClosedPublic

Authored by sbaranga on Nov 2 2015, 8:39 AM.

Download Raw Diff

Details

Reviewers

Commits

rG2910a4f6b156: Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates
rL252467: Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates

Summary

LAA currently generates a set of SCEV predicates that must be checked by users.
In the case of Loop Distribute/Loop Load Elimination, no such predicates could have
been emitted, since we don't allow stride versioning. However, in the future there
could be SCEV predicates that will need to be checked.

This change adds support for SCEV predicate versioning in the Loop Distribute, Loop
Load Eliminate and the loop versioning infrastructure.

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 38929.Nov 2 2015, 8:39 AM

sbaranga retitled this revision from to Allow Loop Distribute and the loop versioning infrastructure to use SCEV predicates.

sbaranga updated this object.

sbaranga added a reviewer: anemet.

sbaranga added a subscriber: llvm-commits.

Herald added a subscriber: sanjoy. · View Herald TranscriptNov 2 2015, 8:39 AM

mssimpso added a subscriber: mssimpso.Nov 3 2015, 6:59 AM

I have a high-level question. How should LoopDist work with SCEVAssumptions? If you look at alias checks we filter those in includeOnlyCrossPartitionChecks to get rid of unnecessary checks.

Could something like this be implemented with SCEVAssumptions? E.g. if you have a may-wrapping pointer that can't alias with anything in the other partition, we don't need to issue the non-wrapping check in order to make distribution correct.

lib/Transforms/Scalar/LoopDistribute.cpp
780–781	This comment needs updating now since we don't only version to disambiguate pointers anymore.
789	Extra () in the first term.
lib/Transforms/Utils/LoopVersioning.cpp
26–38	Now that we will feed LVers with two sets of checks optionally, we should probably move away from taking these in the ctor. I think that we should have two members like addAliasChecks and addSCEVChecks or something. I also don't think that we should take LAI but SCEVPredUnion instead. The second ctor below is fine because the idea there is to take everything from LAI without filtering it.
57	s/MemCheckBB/RuntimeCheckBB

Hi Adam.

In D14240#282030, @anemet wrote:

I have a high-level question. How should LoopDist work with SCEVAssumptions? If you look at alias checks we filter those in includeOnlyCrossPartitionChecks to get rid of unnecessary checks.

That's an interesting problem. Currently we're only making enough assumptions to return true for LAI.canVectorizeMemory() (and LD checks that anyway).
Since it does that, I think it should add all predicates - for the current behaviour.

However if we wouldn't be interested in the answer for canVectorizeMemory() it should be possible to change MemoryDepChecker::areDepsSafe in LAA such that we have a per-dependence predicate. Then we can ignore the predicates dependences we don't care about. Do you think this approach makes sense?

Thanks,
Silviu

In D14240#282137, @sbaranga wrote:

Hi Adam.

In D14240#282030, @anemet wrote:

I have a high-level question. How should LoopDist work with SCEVAssumptions? If you look at alias checks we filter those in includeOnlyCrossPartitionChecks to get rid of unnecessary checks.

That's an interesting problem. Currently we're only making enough assumptions to return true for LAI.canVectorizeMemory() (and LD checks that anyway).
Since it does that, I think it should add all predicates - for the current behaviour.

However if we wouldn't be interested in the answer for canVectorizeMemory() it should be possible to change MemoryDepChecker::areDepsSafe in LAA such that we have a per-dependence predicate. Then we can ignore the predicates dependences we don't care about. Do you think this approach makes sense?

Thanks,
Silviu

I thought about this some more and besides dependences we would also need to figure out for each MemCheck what predicate needs to be true in order for us to be able to emit it.

Applied review comments.

sbaranga added inline comments.Nov 5 2015, 8:13 AM

lib/Transforms/Utils/LoopVersioning.cpp
26–38	Removing the Checks parameter made the two constructors have identical arguments. Since we want to be able to construct with all the changes from the LAI, I've added a bool parameter to enable this.

In D14240#282137, @sbaranga wrote:

However if we wouldn't be interested in the answer for canVectorizeMemory() it should be possible to change MemoryDepChecker::areDepsSafe in LAA such that we have a per-dependence predicate. Then we can ignore the predicates dependences we don't care about. Do you think this approach makes sense?

I am not saying we should implement this but I'd like to think about this now before committing to a design where we'd have no way of dropping unnecessary predicates.

I am not sure that we need per-dependence predicates though. I think it needs to be per alias set. If you have a pointer that may wrap that can now have a dependence to any of the pointers in the same alias set. It is still OK to drop this predicate for loop distribution if all members of this alias set fall into the same partition.

Where are we going to use SCEVPreds with alias checks? Is it to try to shape a SCEV into an AddRec?

include/llvm/Transforms/Utils/LoopVersioning.h
74–76	Sorry I didn't realize that by suggesting "add" in the name you would make this "additive". The problem is now we lost the nice "move constructibility" of Checks. How about making this setAliasChecks which would still take a value and then restoring the std::move's. Would a set* interface work for SCEVUnionPredicate?
99–104	Rename to AliasChecks?

In D14240#283209, @anemet wrote:

In D14240#282137, @sbaranga wrote:

However if we wouldn't be interested in the answer for canVectorizeMemory() it should be possible to change MemoryDepChecker::areDepsSafe in LAA such that we have a per-dependence predicate. Then we can ignore the predicates dependences we don't care about. Do you think this approach makes sense?

I am not saying we should implement this but I'd like to think about this now before committing to a design where we'd have no way of dropping unnecessary predicates.

I am not sure that we need per-dependence predicates though. I think it needs to be per alias set. If you have a pointer that may wrap that can now have a dependence to any of the pointers in the same alias set. It is still OK to drop this predicate for loop distribution if all members of this alias set fall into the same partition.

Where are we going to use SCEVPreds with alias checks? Is it to try to shape a SCEV into an AddRec?

Yes, we would use these to get AddRec expressions. I've also found one additional use case where it's already an AddRec and we want to check for nuw or nsw (see isStridedPtr).

include/llvm/Transforms/Utils/LoopVersioning.h
74–76	Ok, I understand. It wouldn't work at the moment for SCEVUnionPredicate. We need to define copy and move constructors for it. I'll create another review for that.

Added versioning to LLE as well. The same outstanding issues remain as with Loop Distribute.
Now using move/copy constructors to pass both alias checks and SCEV predicates to Loop Versioning.

Hi Adam,

I've uploaded a new patch, and had to make LLE use this as well.

Thanks,
Silviu

include/llvm/Transforms/Utils/LoopVersioning.h
74–76	I was confused about the ability of DenseMap to use copy/move constructors, sorry. It turns out we can actually do the same thing as for alias checks.

LGTM. We will have to add thresholds for the SCEV checks as well but right now these are only placeholders since these passes don't use stride checks.

This revision is now accepted and ready to land.Nov 8 2015, 11:34 PM

Add thresholds for LD and LLE when versioning with SCEV predicates.
These passes won't actually use the predicates (they don't use stride versioning), so the actual value doesn't matter right now.

sbaranga retitled this revision from Allow Loop Distribute and the loop versioning infrastructure to use SCEV predicates to Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates.Nov 9 2015, 5:07 AM

sbaranga updated this object.

sbaranga closed this revision.Nov 9 2015, 5:28 AM

Committed in r252467, thanks!

Hi Adam,

In D14240#283209, @anemet wrote:

I am not saying we should implement this but I'd like to think about this now before committing to a design where we'd have no way of dropping unnecessary predicates.

I am not sure that we need per-dependence predicates though. I think it needs to be per alias set. If you have a pointer that may wrap that can now have a dependence to any of the pointers in the same alias set. It is still OK to drop this predicate for loop distribution if all members of this alias set fall into the same partition.

Where are we going to use SCEVPreds with alias checks? Is it to try to shape a SCEV into an AddRec?

Did you get a chance to think about this? Should we start a thread on llvm-dev?

I guess this would block the addition of new SCEV predicate types..

Thanks,
Silviu

Hi Silviu,

In D14240#288784, @sbaranga wrote:

Hi Adam,

In D14240#283209, @anemet wrote:

I am not saying we should implement this but I'd like to think about this now before committing to a design where we'd have no way of dropping unnecessary predicates.

I am not sure that we need per-dependence predicates though. I think it needs to be per alias set. If you have a pointer that may wrap that can now have a dependence to any of the pointers in the same alias set. It is still OK to drop this predicate for loop distribution if all members of this alias set fall into the same partition.

Where are we going to use SCEVPreds with alias checks? Is it to try to shape a SCEV into an AddRec?

Did you get a chance to think about this? Should we start a thread on llvm-dev?

I guess this would block the addition of new SCEV predicate types..

Sorry about the delay. No, I don't think this should hold up further work.

If we want to tag predicates with either the affected dependences or use some other mechanism is an additional improvement to try to reduce the number of predicates. As long as we keep the threshold for the number predicates low for distribution we should be good.

In D14240#300560, @anemet wrote:

Sorry about the delay. No, I don't think this should hold up further work.

If we want to tag predicates with either the affected dependences or use some other mechanism is an additional improvement to try to reduce the number of predicates. As long as we keep the threshold for the number predicates low for distribution we should be good.

Thanks, that makes sense to me. FWIW this means that after http://reviews.llvm.org/D14296 and a small update to LLE we should be ready to add the no overflow predicates, so we're not that far away.

-Silviu

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

4 lines

Transforms/

Utils/

LoopVersioning.h

32 lines

lib/

Transforms/

Scalar/

LoopDistribute.cpp

26 lines

LoopLoadElimination.cpp

17 lines

Utils/

LoopVersioning.cpp

75 lines

test/

Transforms/

LoopDistribute/

basic-with-memchecks.ll

2 lines

LoopLoadElim/

forward.ll

2 lines

memcheck.ll

2 lines

Diff 39681

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	public:
SCEVPredicate(const FoldingSetNodeIDRef ID, SCEVPredicateKind Kind);		SCEVPredicate(const FoldingSetNodeIDRef ID, SCEVPredicateKind Kind);

virtual ~SCEVPredicate() {}		virtual ~SCEVPredicate() {}

SCEVPredicateKind getKind() const { return Kind; }		SCEVPredicateKind getKind() const { return Kind; }

/// \brief Returns the estimated complexity of this predicate.		/// \brief Returns the estimated complexity of this predicate.
/// This is roughly measured in the number of run-time checks required.		/// This is roughly measured in the number of run-time checks required.
virtual unsigned getComplexity() { return 1; }		virtual unsigned getComplexity() const { return 1; }

/// \brief Returns true if the predicate is always true. This means that no		/// \brief Returns true if the predicate is always true. This means that no
/// assumptions were made and nothing needs to be checked at run-time.		/// assumptions were made and nothing needs to be checked at run-time.
virtual bool isAlwaysTrue() const = 0;		virtual bool isAlwaysTrue() const = 0;

/// \brief Returns true if this predicate implies \p N.		/// \brief Returns true if this predicate implies \p N.
virtual bool implies(const SCEVPredicate *N) const = 0;		virtual bool implies(const SCEVPredicate *N) const = 0;

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	public:
/// Implementation of the SCEVPredicate interface		/// Implementation of the SCEVPredicate interface
bool isAlwaysTrue() const override;		bool isAlwaysTrue() const override;
bool implies(const SCEVPredicate *N) const override;		bool implies(const SCEVPredicate *N) const override;
void print(raw_ostream &OS, unsigned Depth) const override;		void print(raw_ostream &OS, unsigned Depth) const override;
const SCEV *getExpr() const override;		const SCEV *getExpr() const override;

/// \brief We estimate the complexity of a union predicate as the size		/// \brief We estimate the complexity of a union predicate as the size
/// number of predicates in the union.		/// number of predicates in the union.
unsigned getComplexity() override { return Preds.size(); }		unsigned getComplexity() const override { return Preds.size(); }

/// Methods for support type inquiry through isa, cast, and dyn_cast:		/// Methods for support type inquiry through isa, cast, and dyn_cast:
static inline bool classof(const SCEVPredicate *P) {		static inline bool classof(const SCEVPredicate *P) {
return P->getKind() == P_Union;		return P->getKind() == P_Union;
}		}
};		};

/// The main scalar evolution driver. Because client code (intentionally)		/// The main scalar evolution driver. Because client code (intentionally)
▲ Show 20 Lines • Show All 1,025 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopVersioning.h

Show All 11 Lines
// emits checks to prove this.		// emits checks to prove this.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_UTILS_LOOPVERSIONING_H		#ifndef LLVM_TRANSFORMS_UTILS_LOOPVERSIONING_H
#define LLVM_TRANSFORMS_UTILS_LOOPVERSIONING_H		#define LLVM_TRANSFORMS_UTILS_LOOPVERSIONING_H

#include "llvm/Analysis/LoopAccessAnalysis.h"		#include "llvm/Analysis/LoopAccessAnalysis.h"
		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"

namespace llvm {		namespace llvm {

class Loop;		class Loop;
class LoopAccessInfo;		class LoopAccessInfo;
class LoopInfo;		class LoopInfo;
		class ScalarEvolution;

/// \brief This class emits a version of the loop where run-time checks ensure		/// \brief This class emits a version of the loop where run-time checks ensure
/// that may-alias pointers can't overlap.		/// that may-alias pointers can't overlap.
///		///
/// It currently only supports single-exit loops and assumes that the loop		/// It currently only supports single-exit loops and assumes that the loop
/// already has a preheader.		/// already has a preheader.
class LoopVersioning {		class LoopVersioning {
public:		public:
/// \brief Expects MemCheck, LoopAccessInfo, Loop, LoopInfo, DominatorTree
/// as input. It uses runtime check provided by user.
LoopVersioning(SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks,
const LoopAccessInfo &LAI, Loop L, LoopInfo LI,
DominatorTree *DT);

/// \brief Expects LoopAccessInfo, Loop, LoopInfo, DominatorTree as input.		/// \brief Expects LoopAccessInfo, Loop, LoopInfo, DominatorTree as input.
/// It uses default runtime check provided by LoopAccessInfo.		/// It uses runtime check provided by the user. If \p UseLAIChecks is true,
LoopVersioning(const LoopAccessInfo &LAInfo, Loop L, LoopInfo LI,		/// we will retain the default checks made by LAI. Otherwise, construct an
DominatorTree *DT);		/// object having no checks and we expect the user to add them.
		LoopVersioning(const LoopAccessInfo &LAI, Loop L, LoopInfo LI,
		DominatorTree DT, ScalarEvolution SE,
		bool UseLAIChecks = true);

/// \brief Performs the CFG manipulation part of versioning the loop including		/// \brief Performs the CFG manipulation part of versioning the loop including
/// the DominatorTree and LoopInfo updates.		/// the DominatorTree and LoopInfo updates.
///		///
/// The loop that was used to construct the class will be the "versioned" loop		/// The loop that was used to construct the class will be the "versioned" loop
/// i.e. the loop that will receive control if all the memchecks pass.		/// i.e. the loop that will receive control if all the memchecks pass.
///		///
/// This allows the loop transform pass to operate on the same loop regardless		/// This allows the loop transform pass to operate on the same loop regardless
Show All 13 Lines	public:
/// loop don't alias (i.e. all memchecks passed). (This loop is actually the		/// loop don't alias (i.e. all memchecks passed). (This loop is actually the
/// same as the original loop that we got constructed with.)		/// same as the original loop that we got constructed with.)
Loop *getVersionedLoop() { return VersionedLoop; }		Loop *getVersionedLoop() { return VersionedLoop; }

/// \brief Returns the fall-back loop. Control flows here if pointers in the		/// \brief Returns the fall-back loop. Control flows here if pointers in the
/// loop may alias (i.e. one of the memchecks failed).		/// loop may alias (i.e. one of the memchecks failed).
Loop *getNonVersionedLoop() { return NonVersionedLoop; }		Loop *getNonVersionedLoop() { return NonVersionedLoop; }

		/// \brief Sets the runtime alias checks for versioning the loop.
		void setAliasChecks(
		const SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks);
		anemetUnsubmitted Not Done Reply Inline Actions Sorry I didn't realize that by suggesting "add" in the name you would make this "additive". The problem is now we lost the nice "move constructibility" of Checks. How about making this setAliasChecks which would still take a value and then restoring the std::move's. Would a set* interface work for SCEVUnionPredicate? anemet: Sorry I didn't realize that by suggesting "add" in the name you would make this "additive".
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Ok, I understand. It wouldn't work at the moment for SCEVUnionPredicate. We need to define copy and move constructors for it. I'll create another review for that. sbaranga: Ok, I understand. It wouldn't work at the moment for SCEVUnionPredicate. We need to define copy…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I was confused about the ability of DenseMap to use copy/move constructors, sorry. It turns out we can actually do the same thing as for alias checks. sbaranga: I was confused about the ability of DenseMap to use copy/move constructors, sorry. It turns out…

		/// \brief Sets the runtime SCEV checks for versioning the loop.
		void setSCEVChecks(SCEVUnionPredicate Check);

private:		private:
/// \brief Adds the necessary PHI nodes for the versioned loops based on the		/// \brief Adds the necessary PHI nodes for the versioned loops based on the
/// loop-defined values used outside of the loop.		/// loop-defined values used outside of the loop.
///		///
/// This needs to be called after versionLoop if there are defs in the loop		/// This needs to be called after versionLoop if there are defs in the loop
/// that are used outside the loop.		/// that are used outside the loop.
void addPHINodes(const SmallVectorImpl<Instruction *> &DefsUsedOutside);		void addPHINodes(const SmallVectorImpl<Instruction *> &DefsUsedOutside);

/// \brief The original loop. This becomes the "versioned" one. I.e.,		/// \brief The original loop. This becomes the "versioned" one. I.e.,
/// control flows here if pointers in the loop don't alias.		/// control flows here if pointers in the loop don't alias.
Loop *VersionedLoop;		Loop *VersionedLoop;
/// \brief The fall-back loop. I.e. control flows here if pointers in the		/// \brief The fall-back loop. I.e. control flows here if pointers in the
/// loop may alias (memchecks failed).		/// loop may alias (memchecks failed).
Loop *NonVersionedLoop;		Loop *NonVersionedLoop;

/// \brief This maps the instructions from VersionedLoop to their counterpart		/// \brief This maps the instructions from VersionedLoop to their counterpart
/// in NonVersionedLoop.		/// in NonVersionedLoop.
ValueToValueMapTy VMap;		ValueToValueMapTy VMap;

/// \brief The set of checks that we are versioning for.		/// \brief The set of alias checks that we are versioning for.
SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks;		SmallVector<RuntimePointerChecking::PointerCheck, 4> AliasChecks;

		/// \brief The set of SCEV checks that we are versioning for.
		SCEVUnionPredicate Preds;
		anemetUnsubmitted Not Done Reply Inline Actions Rename to AliasChecks? anemet: Rename to AliasChecks?

/// \brief Analyses used.		/// \brief Analyses used.
const LoopAccessInfo &LAI;		const LoopAccessInfo &LAI;
LoopInfo *LI;		LoopInfo *LI;
DominatorTree *DT;		DominatorTree *DT;
		ScalarEvolution *SE;
};		};
}		}

#endif		#endif

lib/Transforms/Scalar/LoopDistribute.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	LDistVerify("loop-distribute-verify", cl::Hidden,
cl::init(false));		cl::init(false));

static cl::opt<bool> DistributeNonIfConvertible(		static cl::opt<bool> DistributeNonIfConvertible(
"loop-distribute-non-if-convertible", cl::Hidden,		"loop-distribute-non-if-convertible", cl::Hidden,
cl::desc("Whether to distribute into a loop that may not be "		cl::desc("Whether to distribute into a loop that may not be "
"if-convertible by the loop vectorizer"),		"if-convertible by the loop vectorizer"),
cl::init(false));		cl::init(false));

		static cl::opt<unsigned> DistributeSCEVCheckThreshold(
		"loop-distribute-scev-check-threshold", cl::init(8), cl::Hidden,
		cl::desc("The maximum number of SCEV checks allowed for Loop "
		"Distribution"));

STATISTIC(NumLoopsDistributed, "Number of loops distributed");		STATISTIC(NumLoopsDistributed, "Number of loops distributed");

namespace {		namespace {
/// \brief Maintains the set of instructions of the loop for a partition before		/// \brief Maintains the set of instructions of the loop for a partition before
/// cloning. After cloning, it hosts the new loop.		/// cloning. After cloning, it hosts the new loop.
class InstPartition {		class InstPartition {
typedef SmallPtrSet<Instruction *, 8> InstructionSet;		typedef SmallPtrSet<Instruction *, 8> InstructionSet;

▲ Show 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	public:
LoopDistribute() : FunctionPass(ID) {		LoopDistribute() : FunctionPass(ID) {
initializeLoopDistributePass(*PassRegistry::getPassRegistry());		initializeLoopDistributePass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
LAA = &getAnalysis<LoopAccessAnalysis>();		LAA = &getAnalysis<LoopAccessAnalysis>();
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();

// Build up a worklist of inner-loops to vectorize. This is necessary as the		// Build up a worklist of inner-loops to vectorize. This is necessary as the
// act of distributing a loop creates new loops and can invalidate iterators		// act of distributing a loop creates new loops and can invalidate iterators
// across the loops.		// across the loops.
SmallVector<Loop *, 8> Worklist;		SmallVector<Loop *, 8> Worklist;

for (Loop TopLevelLoop : LI)		for (Loop TopLevelLoop : LI)
for (Loop *L : depth_first(TopLevelLoop))		for (Loop *L : depth_first(TopLevelLoop))
// We only handle inner-most loops.		// We only handle inner-most loops.
if (L->empty())		if (L->empty())
Worklist.push_back(L);		Worklist.push_back(L);

// Now walk the identified inner loops.		// Now walk the identified inner loops.
bool Changed = false;		bool Changed = false;
for (Loop *L : Worklist)		for (Loop *L : Worklist)
Changed \|= processLoop(L);		Changed \|= processLoop(L);

// Process each loop nest in the function.		// Process each loop nest in the function.
return Changed;		return Changed;
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<ScalarEvolutionWrapperPass>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();		AU.addPreserved<LoopInfoWrapperPass>();
AU.addRequired<LoopAccessAnalysis>();		AU.addRequired<LoopAccessAnalysis>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
}		}

static char ID;		static char ID;
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	bool processLoop(Loop *L) {
// partition that we set up in the MemoryInstructionDependences loop.		// partition that we set up in the MemoryInstructionDependences loop.
if (Partitions.mergeToAvoidDuplicatedLoads()) {		if (Partitions.mergeToAvoidDuplicatedLoads()) {
DEBUG(dbgs() << "\nPartitions merged to ensure unique loads:\n"		DEBUG(dbgs() << "\nPartitions merged to ensure unique loads:\n"
<< Partitions);		<< Partitions);
if (Partitions.getSize() < 2)		if (Partitions.getSize() < 2)
return false;		return false;
}		}

		// Don't distribute the loop if we need too many SCEV run-time checks.
		const SCEVUnionPredicate &Pred = LAI.Preds;
		if (Pred.getComplexity() > DistributeSCEVCheckThreshold) {
		DEBUG(dbgs() << "Too many SCEV run-time checks needed.\n");
		return false;
		}

DEBUG(dbgs() << "\nDistributing loop: " << *L << "\n");		DEBUG(dbgs() << "\nDistributing loop: " << *L << "\n");
// We're done forming the partitions set up the reverse mapping from		// We're done forming the partitions set up the reverse mapping from
// instructions to partitions.		// instructions to partitions.
Partitions.setupPartitionIdOnInstructions();		Partitions.setupPartitionIdOnInstructions();

// To keep things simple have an empty preheader before we version or clone		// To keep things simple have an empty preheader before we version or clone
// the loop. (Also split if this has no predecessor, i.e. entry, because we		// the loop. (Also split if this has no predecessor, i.e. entry, because we
// rely on PH having a predecessor.)		// rely on PH having a predecessor.)
if (!PH->getSinglePredecessor() \|\| &*PH->begin() != PH->getTerminator())		if (!PH->getSinglePredecessor() \|\| &*PH->begin() != PH->getTerminator())
SplitBlock(PH, PH->getTerminator(), DT, LI);		SplitBlock(PH, PH->getTerminator(), DT, LI);

// If we need run-time checks to disambiguate pointers are run-time, version		// If we need run-time checks, version the loop now.
		anemetUnsubmitted Not Done Reply Inline Actions This comment needs updating now since we don't only version to disambiguate pointers anymore. anemet: This comment needs updating now since we don't only version to disambiguate pointers anymore.
// the loop now.
auto PtrToPartition = Partitions.computePartitionSetForPointers(LAI);		auto PtrToPartition = Partitions.computePartitionSetForPointers(LAI);
const auto *RtPtrChecking = LAI.getRuntimePointerChecking();		const auto *RtPtrChecking = LAI.getRuntimePointerChecking();
const auto &AllChecks = RtPtrChecking->getChecks();		const auto &AllChecks = RtPtrChecking->getChecks();
auto Checks = includeOnlyCrossPartitionChecks(AllChecks, PtrToPartition,		auto Checks = includeOnlyCrossPartitionChecks(AllChecks, PtrToPartition,
RtPtrChecking);		RtPtrChecking);
if (!Checks.empty()) {
		if (!Pred.isAlwaysTrue() \|\| !Checks.empty()) {
DEBUG(dbgs() << "\nPointers:\n");		DEBUG(dbgs() << "\nPointers:\n");
		anemetUnsubmitted Not Done Reply Inline Actions Extra () in the first term. anemet: Extra () in the first term.
DEBUG(LAI.getRuntimePointerChecking()->printChecks(dbgs(), Checks));		DEBUG(LAI.getRuntimePointerChecking()->printChecks(dbgs(), Checks));
LoopVersioning LVer(std::move(Checks), LAI, L, LI, DT);		LoopVersioning LVer(LAI, L, LI, DT, SE, false);
		LVer.setAliasChecks(std::move(Checks));
		LVer.setSCEVChecks(LAI.Preds);
LVer.versionLoop(DefsUsedOutside);		LVer.versionLoop(DefsUsedOutside);
}		}

// Create identical copies of the original loop for each partition and hook		// Create identical copies of the original loop for each partition and hook
// them up sequentially.		// them up sequentially.
Partitions.cloneLoops(this);		Partitions.cloneLoops(this);

// Now, we remove the instruction from each loop that don't belong to that		// Now, we remove the instruction from each loop that don't belong to that
Show All 10 Lines	bool processLoop(Loop *L) {
++NumLoopsDistributed;		++NumLoopsDistributed;
return true;		return true;
}		}

// Analyses used.		// Analyses used.
LoopInfo *LI;		LoopInfo *LI;
LoopAccessAnalysis *LAA;		LoopAccessAnalysis *LAA;
DominatorTree *DT;		DominatorTree *DT;
		ScalarEvolution *SE;
};		};
} // anonymous namespace		} // anonymous namespace

char LoopDistribute::ID;		char LoopDistribute::ID;
static const char ldist_name[] = "Loop Distribition";		static const char ldist_name[] = "Loop Distribition";

INITIALIZE_PASS_BEGIN(LoopDistribute, LDIST_NAME, ldist_name, false, false)		INITIALIZE_PASS_BEGIN(LoopDistribute, LDIST_NAME, ldist_name, false, false)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LoopAccessAnalysis)		INITIALIZE_PASS_DEPENDENCY(LoopAccessAnalysis)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
INITIALIZE_PASS_END(LoopDistribute, LDIST_NAME, ldist_name, false, false)		INITIALIZE_PASS_END(LoopDistribute, LDIST_NAME, ldist_name, false, false)

namespace llvm {		namespace llvm {
FunctionPass *createLoopDistributePass() { return new LoopDistribute(); }		FunctionPass *createLoopDistributePass() { return new LoopDistribute(); }
}		}

lib/Transforms/Scalar/LoopLoadElimination.cpp

Show All 35 Lines

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned> CheckPerElim(		static cl::opt<unsigned> CheckPerElim(
"runtime-check-per-loop-load-elim", cl::Hidden,		"runtime-check-per-loop-load-elim", cl::Hidden,
cl::desc("Max number of memchecks allowed per eliminated load on average"),		cl::desc("Max number of memchecks allowed per eliminated load on average"),
cl::init(1));		cl::init(1));

		static cl::opt<unsigned> LoadElimSCEVCheckThreshold(
		"loop-load-elimination-scev-check-threshold", cl::init(8), cl::Hidden,
		cl::desc("The maximum number of SCEV checks allowed for Loop "
		"Load Elimination"));


STATISTIC(NumLoopLoadEliminted, "Number of loads eliminated by LLE");		STATISTIC(NumLoopLoadEliminted, "Number of loads eliminated by LLE");

namespace {		namespace {

/// \brief Represent a store-to-forwarding candidate.		/// \brief Represent a store-to-forwarding candidate.
struct StoreToLoadForwardingCandidate {		struct StoreToLoadForwardingCandidate {
LoadInst *Load;		LoadInst *Load;
StoreInst *Store;		StoreInst *Store;
▲ Show 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks =
collectMemchecks(Candidates);		collectMemchecks(Candidates);

// Too many checks are likely to outweigh the benefits of forwarding.		// Too many checks are likely to outweigh the benefits of forwarding.
if (Checks.size() > Candidates.size() * CheckPerElim) {		if (Checks.size() > Candidates.size() * CheckPerElim) {
DEBUG(dbgs() << "Too many run-time checks needed.\n");		DEBUG(dbgs() << "Too many run-time checks needed.\n");
return false;		return false;
}		}

		if (LAI.Preds.getComplexity() > LoadElimSCEVCheckThreshold) {
		DEBUG(dbgs() << "Too many SCEV run-time checks needed.\n");
		return false;
		}

// Point of no-return, start the transformation. First, version the loop if		// Point of no-return, start the transformation. First, version the loop if
// necessary.		// necessary.
if (!Checks.empty()) {		if (!Checks.empty() \|\| !LAI.Preds.isAlwaysTrue()) {
LoopVersioning LV(std::move(Checks), LAI, L, LI, DT);		LoopVersioning LV(LAI, L, LI, DT, SE, false);
		LV.setAliasChecks(std::move(Checks));
		LV.setSCEVChecks(LAI.Preds);
LV.versionLoop();		LV.versionLoop();
}		}

// Next, propagate the value stored by the store to the users of the load.		// Next, propagate the value stored by the store to the users of the load.
// Also for the first iteration, generate the initial value of the load.		// Also for the first iteration, generate the initial value of the load.
SCEVExpander SEE(*SE, L->getHeader()->getModule()->getDataLayout(),		SCEVExpander SEE(*SE, L->getHeader()->getModule()->getDataLayout(),
"storeforward");		"storeforward");
for (const auto &Cand : Candidates)		for (const auto &Cand : Candidates)
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopVersioning.cpp

	Show All 11 Lines
	// emits checks to prove this.			// emits checks to prove this.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Transforms/Utils/LoopVersioning.h"			#include "llvm/Transforms/Utils/LoopVersioning.h"

	#include "llvm/Analysis/LoopAccessAnalysis.h"			#include "llvm/Analysis/LoopAccessAnalysis.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/Transforms/Utils/BasicBlockUtils.h"			#include "llvm/Transforms/Utils/BasicBlockUtils.h"
	#include "llvm/Transforms/Utils/Cloning.h"			#include "llvm/Transforms/Utils/Cloning.h"

	using namespace llvm;			using namespace llvm;

	LoopVersioning::LoopVersioning(			LoopVersioning::LoopVersioning(const LoopAccessInfo &LAI, Loop L, LoopInfo LI,
	SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks,			DominatorTree DT, ScalarEvolution SE,
	const LoopAccessInfo &LAI, Loop L, LoopInfo LI, DominatorTree *DT)			bool UseLAIChecks)
	: VersionedLoop(L), NonVersionedLoop(nullptr), Checks(std::move(Checks)),			: VersionedLoop(L), NonVersionedLoop(nullptr), LAI(LAI), LI(LI), DT(DT),
	LAI(LAI), LI(LI), DT(DT) {			SE(SE) {
	assert(L->getExitBlock() && "No single exit block");			assert(L->getExitBlock() && "No single exit block");
	assert(L->getLoopPreheader() && "No preheader");			assert(L->getLoopPreheader() && "No preheader");
				if (UseLAIChecks) {
				setAliasChecks(LAI.getRuntimePointerChecking()->getChecks());
				setSCEVChecks(LAI.Preds);
				}
	}			}
				anemetUnsubmitted Not Done Reply Inline Actions Now that we will feed LVers with two sets of checks optionally, we should probably move away from taking these in the ctor. I think that we should have two members like addAliasChecks and addSCEVChecks or something. I also don't think that we should take LAI but SCEVPredUnion instead. The second ctor below is fine because the idea there is to take everything from LAI without filtering it. anemet: Now that we will feed LVers with two sets of checks optionally, we should probably move away…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Removing the Checks parameter made the two constructors have identical arguments. Since we want to be able to construct with all the changes from the LAI, I've added a bool parameter to enable this. sbaranga: Removing the Checks parameter made the two constructors have identical arguments. Since we want…

	LoopVersioning::LoopVersioning(const LoopAccessInfo &LAInfo, Loop *L,			void LoopVersioning::setAliasChecks(
	LoopInfo LI, DominatorTree DT)			const SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks) {
	: VersionedLoop(L), NonVersionedLoop(nullptr),			AliasChecks = std::move(Checks);
	Checks(LAInfo.getRuntimePointerChecking()->getChecks()), LAI(LAInfo),			}
	LI(LI), DT(DT) {
	assert(L->getExitBlock() && "No single exit block");			void LoopVersioning::setSCEVChecks(SCEVUnionPredicate Check) {
	assert(L->getLoopPreheader() && "No preheader");			Preds = std::move(Check);
	}			}

	void LoopVersioning::versionLoop(			void LoopVersioning::versionLoop(
	const SmallVectorImpl<Instruction *> &DefsUsedOutside) {			const SmallVectorImpl<Instruction *> &DefsUsedOutside) {
	Instruction *FirstCheckInst;			Instruction *FirstCheckInst;
	Instruction *MemRuntimeCheck;			Instruction *MemRuntimeCheck;
				Value *SCEVRuntimeCheck;
				Value *RuntimeCheck = nullptr;

	// Add the memcheck in the original preheader (this is empty initially).			// Add the memcheck in the original preheader (this is empty initially).
	BasicBlock *MemCheckBB = VersionedLoop->getLoopPreheader();			BasicBlock *RuntimeCheckBB = VersionedLoop->getLoopPreheader();
				anemetUnsubmitted Not Done Reply Inline Actions s/MemCheckBB/RuntimeCheckBB anemet: s/MemCheckBB/RuntimeCheckBB
	std::tie(FirstCheckInst, MemRuntimeCheck) =			std::tie(FirstCheckInst, MemRuntimeCheck) =
	LAI.addRuntimeChecks(MemCheckBB->getTerminator(), Checks);			LAI.addRuntimeChecks(RuntimeCheckBB->getTerminator(), AliasChecks);
	assert(MemRuntimeCheck && "called even though needsAnyChecking = false");			assert(MemRuntimeCheck && "called even though needsAnyChecking = false");

				const SCEVUnionPredicate &Pred = LAI.Preds;
				SCEVExpander Exp(*SE, RuntimeCheckBB->getModule()->getDataLayout(),
				"scev.check");
				SCEVRuntimeCheck =
				Exp.expandCodeForPredicate(&Pred, RuntimeCheckBB->getTerminator());
				auto *CI = dyn_cast<ConstantInt>(SCEVRuntimeCheck);

				// Discard the SCEV runtime check if it is always true.
				if (CI && CI->isZero())
				SCEVRuntimeCheck = nullptr;

				if (MemRuntimeCheck && SCEVRuntimeCheck) {
				RuntimeCheck = BinaryOperator::Create(Instruction::Or, MemRuntimeCheck,
				SCEVRuntimeCheck, "ldist.safe");
				if (auto *I = dyn_cast<Instruction>(RuntimeCheck))
				I->insertBefore(RuntimeCheckBB->getTerminator());
				} else
				RuntimeCheck = MemRuntimeCheck ? MemRuntimeCheck : SCEVRuntimeCheck;

				assert(RuntimeCheck && "called even though we don't need "
				"any runtime checks");

	// Rename the block to make the IR more readable.			// Rename the block to make the IR more readable.
	MemCheckBB->setName(VersionedLoop->getHeader()->getName() + ".lver.memcheck");			RuntimeCheckBB->setName(VersionedLoop->getHeader()->getName() +
				".lver.check");

	// Create empty preheader for the loop (and after cloning for the			// Create empty preheader for the loop (and after cloning for the
	// non-versioned loop).			// non-versioned loop).
	BasicBlock *PH = SplitBlock(MemCheckBB, MemCheckBB->getTerminator(), DT, LI);			BasicBlock *PH =
				SplitBlock(RuntimeCheckBB, RuntimeCheckBB->getTerminator(), DT, LI);
	PH->setName(VersionedLoop->getHeader()->getName() + ".ph");			PH->setName(VersionedLoop->getHeader()->getName() + ".ph");

	// Clone the loop including the preheader.			// Clone the loop including the preheader.
	//			//
	// FIXME: This does not currently preserve SimplifyLoop because the exit			// FIXME: This does not currently preserve SimplifyLoop because the exit
	// block is a join between the two loops.			// block is a join between the two loops.
	SmallVector<BasicBlock *, 8> NonVersionedLoopBlocks;			SmallVector<BasicBlock *, 8> NonVersionedLoopBlocks;
	NonVersionedLoop =			NonVersionedLoop =
	cloneLoopWithPreheader(PH, MemCheckBB, VersionedLoop, VMap, ".lver.orig",			cloneLoopWithPreheader(PH, RuntimeCheckBB, VersionedLoop, VMap,
	LI, DT, NonVersionedLoopBlocks);			".lver.orig", LI, DT, NonVersionedLoopBlocks);
	remapInstructionsInBlocks(NonVersionedLoopBlocks, VMap);			remapInstructionsInBlocks(NonVersionedLoopBlocks, VMap);

	// Insert the conditional branch based on the result of the memchecks.			// Insert the conditional branch based on the result of the memchecks.
	Instruction *OrigTerm = MemCheckBB->getTerminator();			Instruction *OrigTerm = RuntimeCheckBB->getTerminator();
	BranchInst::Create(NonVersionedLoop->getLoopPreheader(),			BranchInst::Create(NonVersionedLoop->getLoopPreheader(),
	VersionedLoop->getLoopPreheader(), MemRuntimeCheck,			VersionedLoop->getLoopPreheader(), RuntimeCheck, OrigTerm);
	OrigTerm);
	OrigTerm->eraseFromParent();			OrigTerm->eraseFromParent();

	// The loops merge in the original exit block. This is now dominated by the			// The loops merge in the original exit block. This is now dominated by the
	// memchecking block.			// memchecking block.
	DT->changeImmediateDominator(VersionedLoop->getExitBlock(), MemCheckBB);			DT->changeImmediateDominator(VersionedLoop->getExitBlock(), RuntimeCheckBB);

	// Adds the necessary PHI nodes for the versioned loops based on the			// Adds the necessary PHI nodes for the versioned loops based on the
	// loop-defined values used outside of the loop.			// loop-defined values used outside of the loop.
	addPHINodes(DefsUsedOutside);			addPHINodes(DefsUsedOutside);
	}			}

	void LoopVersioning::addPHINodes(			void LoopVersioning::addPHINodes(
	const SmallVectorImpl<Instruction *> &DefsUsedOutside) {			const SmallVectorImpl<Instruction *> &DefsUsedOutside) {
	Show All 28 Lines

test/Transforms/LoopDistribute/basic-with-memchecks.ll

Show All 30 Lines	entry:
%d = load i32, i32* @D, align 8		%d = load i32, i32* @D, align 8
%e = load i32, i32* @E, align 8		%e = load i32, i32* @E, align 8
br label %for.body		br label %for.body

; We have two compares for each array overlap check.		; We have two compares for each array overlap check.
; Since the checks to A and A + 4 get merged, this will give us a		; Since the checks to A and A + 4 get merged, this will give us a
; total of 8 compares.		; total of 8 compares.
;		;
; CHECK: for.body.lver.memcheck:		; CHECK: for.body.lver.check:
; CHECK: = icmp		; CHECK: = icmp
; CHECK: = icmp		; CHECK: = icmp

; CHECK: = icmp		; CHECK: = icmp
; CHECK: = icmp		; CHECK: = icmp

; CHECK: = icmp		; CHECK: = icmp
; CHECK: = icmp		; CHECK: = icmp
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

test/Transforms/LoopLoadElim/forward.ll

	; RUN: opt -loop-load-elim -S < %s \| FileCheck %s			; RUN: opt -loop-load-elim -S < %s \| FileCheck %s

	; Simple st->ld forwarding derived from a lexical forwrad dep.			; Simple st->ld forwarding derived from a lexical forwrad dep.
	;			;
	; for (unsigned i = 0; i < 100; i++) {			; for (unsigned i = 0; i < 100; i++) {
	; A[i+1] = B[i] + 2;			; A[i+1] = B[i] + 2;
	; C[i] = A[i] * 2;			; C[i] = A[i] * 2;
	; }			; }

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i32* %B, i32* %C, i64 %N) {			define void @f(i32* %A, i32* %B, i32* %C, i64 %N) {

	; CHECK: for.body.lver.memcheck:			; CHECK: for.body.lver.check:
	; CHECK: %found.conflict{{.*}} =			; CHECK: %found.conflict{{.*}} =
	; CHECK-NOT: %found.conflict{{.*}} =			; CHECK-NOT: %found.conflict{{.*}} =

	entry:			entry:
	; for.body.ph:			; for.body.ph:
	; CHECK: %load_initial = load i32, i32* %A			; CHECK: %load_initial = load i32, i32* %A
	br label %for.body			br label %for.body

	Show All 25 Lines

test/Transforms/LoopLoadElim/memcheck.ll

	Show All 10 Lines
	; }			; }

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i32* %B, i32* %C, i64 %N, i32* %D) {			define void @f(i32* %A, i32* %B, i32* %C, i64 %N, i32* %D) {
	entry:			entry:
	br label %for.body			br label %for.body

	; AGGRESSIVE: for.body.lver.memcheck:			; AGGRESSIVE: for.body.lver.check:
	; AGGRESSIVE: %found.conflict{{.*}} =			; AGGRESSIVE: %found.conflict{{.*}} =
	; AGGRESSIVE: %found.conflict{{.*}} =			; AGGRESSIVE: %found.conflict{{.*}} =
	; AGGRESSIVE-NOT: %found.conflict{{.*}} =			; AGGRESSIVE-NOT: %found.conflict{{.*}} =

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	; CHECK-NOT: %store_forwarded =			; CHECK-NOT: %store_forwarded =
	; AGGRESSIVE: %store_forwarded =			; AGGRESSIVE: %store_forwarded =
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Allow LLE/LD and the loop versioning infrastructure to use SCEV predicatesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 39681

include/llvm/Analysis/ScalarEvolution.h

include/llvm/Transforms/Utils/LoopVersioning.h

lib/Transforms/Scalar/LoopDistribute.cpp

lib/Transforms/Scalar/LoopLoadElimination.cpp

lib/Transforms/Utils/LoopVersioning.cpp

test/Transforms/LoopDistribute/basic-with-memchecks.ll

test/Transforms/LoopLoadElim/forward.ll

test/Transforms/LoopLoadElim/memcheck.ll

Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates
ClosedPublic