This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Add support of a new optimization case to Loop Versioning for LICM + code clean up
Needs ReviewPublic

Authored by eastig on Sep 26 2016, 11:56 AM.

Download Raw Diff

Details

Reviewers

anemet
hfinkel
ashutosh.nema

Summary

At the moment Loop Versioning for LICM does not support the following loops which, if versioned, give ~+18-40%% score improvement of benchmarks on Cortex-M7:

void mem_copy_01(char **dst, char **src, int bytes_count) {
    while (bytes_count--)
    {
      *((*dst)++) = *((*src)++);
    }
}

void mem_copy_02(char **dst, char *src, int bytes_count) {
    while (bytes_count--)
    {
      *((*dst)++) = *src++;
    }
}

void mem_copy_03(char *dst, char **src, int bytes_count) {
    while (bytes_count--)
    {
      *dst++ = *((*src)++);
    }
}

IR of mem_copy_01:

define void @mem_copy_01(i8** nocapture %dst, i8** nocapture %src, i32 %bytes_count) {
entry:
  %tobool2 = icmp eq i32 %bytes_count, 0
  br i1 %tobool2, label %while.end, label %while.body

while.body:                                       ; preds = %entry, %while.body
  %bytes_count.addr.03 = phi i32 [ %dec, %while.body ], [ %bytes_count, %entry ]
  %dec = add nsw i32 %bytes_count.addr.03, -1
  %0 = load i8*, i8** %src, align 4, !tbaa !3
  %incdec.ptr = getelementptr inbounds i8, i8* %0, i32 1
  store i8* %incdec.ptr, i8** %src, align 4, !tbaa !3
  %1 = load i8, i8* %0, align 1, !tbaa !7
  %2 = load i8*, i8** %dst, align 4, !tbaa !3
  %incdec.ptr1 = getelementptr inbounds i8, i8* %2, i32 1
  store i8* %incdec.ptr1, i8** %dst, align 4, !tbaa !3
  store i8 %1, i8* %2, align 1, !tbaa !7
  %tobool = icmp eq i32 %dec, 0
  br i1 %tobool, label %while.end, label %while.body

while.end:                                        ; preds = %while.body, %entry
  ret void
}

LoopAccessAnalysis can create aliasing checks for src and dst but not for *src and *dst because *src and *dst are loaded from memory. If we look at IR above we can notice how the pointers are defined and used (InvPtr - loop invariant pointer):

Ptr = Load(InvPtr)
NewPtr = GEP(Ptr, Const)
Store(NewPtr, InvPtr)
Mem_operations using Ptr

If Ptr and InvPtr are not aliased at the iteration N then at the iteration N+1 the value of Ptr is the value defined by the GEP instruction.
Without aliasing Ptr has values from [Ptr0, Ptr0 + (number_of_iterations-1) * type_size * GEP_index], where Ptr0 is Load(InvPtr) at the first iteration.

Absence of aliasing means:

4_or_8_bytes_aligned(Ptr0) != InvPtr                                          : iteration 1
4_or_8_bytes_aligned(Ptr0 + type_size*GEP_index) != InvPtr   : iteration 2
4_or_8_bytes_aligned(Ptr0 + 2*type_size*GEP_index) != InvPtr: iteration 3
...

Aligned Ptr0 is used because InvPtr is a pointer to a pointer and it's aligned either 4 or 8 bytes.
We can write a stricter check:
InvPtr is not in [4_or_8_bytes_aligned(Ptr0), Ptr0 + (number_of_iterations-1) * type_size * GEP_index]
which guarantees all checks above are satisfied.

We check only aliasing among pointers loaded from invariant locations and pointers to those locations which is enough to make decisions to move operations on invariant pointers out of a loop. As checks are for the purpose of LICM and don't cover all pointers combinations creation/adding of them can not be in LoopAccessAnalysis/LoopVersioning. LoopAccessAnalysis/LoopVersioning should provide a means of processing unrecognized pointers and adding checks for them.

Summary of changes:

Clean up of the code of Loop Versioning for LICM. See comments to the changes below.
LoopVersioning::versionLoop functions are changed to return BasicBlock where RT checks are inserted. The return basic block can be used for inserting additional checks.
LoopAccessAnalysis can operate in 'PartialCheckingAllowed' state which mean to create RT checks for recognized pointers and collect unrecognized pointers. The unrecognized pointers can be processed by a user of LAA later.
Recognition of the new optimization case is added to Loop Versioning for LICM.
New tests are added.
Old tests are updated.

Diff Detail

Build Status

Buildable 114
Build 114: arc lint + arc unit

Event Timeline

eastig updated this revision to Diff 72477.Sep 26 2016, 11:56 AM

eastig retitled this revision from to [LICM] Add support of a new optimization case to Loop Versioning for LICM + code clean up.

eastig updated this object.

eastig added reviewers: hfinkel, anemet, ashutosh.nema.

eastig added subscribers: llvm-commits, jmolloy.

Herald added subscribers: mzolotukhin, sanjoy. · View Herald TranscriptSep 26 2016, 11:56 AM

eastig added inline comments.Sep 26 2016, 12:24 PM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
17	We can mark only memory operation for which RT checks are created.
161	TargetLibraryInfo is not used in the code.
172	Removed unused fields or made them function scoped instead of class scoped.
189	Made class fields private because there is no need to have them public.
305	TypeCheck is needed when we with AliasSet with MustAlias. E.g. char * and char ** pointers may be aliased but they have different types.
502	Loop Versioning provides API for annotating a loop with 'no alias' metadata. It's better not to duplicate this functionality.
524	This is not safe. Replaced with unique_ptr.
test/Transforms/LoopVersioningLICM/loopversioningLICM1.ll
15	Meaningless metadata. The instruction is not aliased with itself by default. Also it's better not to hard-coded ids as they can be changed.

Thanks Evgeny for working on this.

Currently when pointers bounds are unknown we don’t consider them
for alias check and we avoid versioning, but this change introduces partial
alias check, it may be risky for applications, wondering what’s the purpose partial
alias check.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
403	This function looks for very specific pattern, need to generalize.
431	Why restricting to single load & its users as memory operations & GEP ?
446	Why one use ?
489	Current LoopVersioningLICM supports loop nest, i.e. it targets innermost loop. This behavior should be keep supported.
496	Earlier collected write to memory as well, why don't want to consider them here ?

In D24934#553702, @ashutosh.nema wrote:

Thanks Evgeny for working on this.

Currently when pointers bounds are unknown we don’t consider them
for alias check and we avoid versioning, but this change introduces partial
alias check, it may be risky for applications, wondering what’s the purpose partial
alias check.

The purpose of the partial alias checking is to split responsibility between LoopAccessAnalysis and its user. LoopAccessAnalysis is a very powerful tool but it tries to cover all pointers in a loop which is not always possible. IMHO it's up to the user of LoopAccessAnalysis what to do with unrecognized pointers.
E.g. if there are three pointers:

LoopInvariantPointer
PointerA
PointerB

then LICM can move operations on LoopInvariantPointer out of a loop if it's proven that LoopInvariantPointer is not aliased with PointerA and PointerB. We don't need to check aliasing of PointerA and PointerB. At the moment if any of pointers are not recognized no checks are created. Why not to allow to reuse results of analysis? Actually I would like to have more control on which RT checks need to be finally created.

eastig added inline comments.Sep 27 2016, 8:15 AM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
403	Yes, it's currently for very a specific pattern which we've got from benchmarks and real code. To generalize the function I need to have more patterns.
431	Because this is the current pattern we want to recognized. More uses might create sophisticated DFG which is not worth to analyze. Such cases should be discovered first then get analyzed.
446	The same answer as above. Lack of real use cases.
489	The old cases work as before, nothing is changed. The behaviour is only changed for the new cases. Loads are very heavy operations it is too dangerous to move them from an inner loop to an upper loop because the upper loop might have 1000 iterations and the inner loop might have 10. If there are aliased pointers a operation loading a pointer will be executed 11000 times instead of 10000 times: 1000 times in the RT checking basic block + 10000 times in the original loop. To make a proper decision an execution profile should be used.
496	LoopAccessAnalysis collects pointers in terms of Values. Writes are users of those pointers. So for each pointer in a loop we have an instruction defining it.

Ping

ashutosh.nema added inline comments.Oct 4 2016, 1:14 AM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
489	Agree it does not change the existing behavior, but for new cases why you enforcing such restriction. In your example it can be other way around as well where the inner loop has 1000 iteration and outer loop has 10 iterations and its actually beneficial to hoist load from inner load to outer lop. LoopVersioning LICM does not make any hoisting decision, actual decision & hoisting will be done later by LICM. Are you getting issues/degrades by allowing inner loops ?

eastig added inline comments.Oct 4 2016, 1:57 AM

lib/Transforms/Scalar/LoopVersioningLICM.cpp
489	Are you getting issues/degrades by allowing inner loops ? No, I have not seen any issues and performance regressions yet. I finally made runs when the pass was enabled by default as a part of the middle-end. Before I had manually run opt with the pass and then llc. I've got +20%...+48.5% performance improvement. This restriction is fully based on my experience of implementing different optimizations for loop nests. I will remove the restriction.

Removed the restriction not to apply the optimization in case of loop nest.

Ping

I just had a brief look through this, but I don't think LAA changes are correct (see the inline comment).
Also if there are non-functional changes here, could you split them into another review (this would make things easier to understand)?

Cheers,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
664	Won't this just make canCheckPtrAtRT say that it can ignore all pointers with unknown bounds? I think this would be incorrect for all other LAA users.

eastig added inline comments.Oct 10 2016, 3:01 PM

lib/Analysis/LoopAccessAnalysis.cpp
664	They are not ignored. They are collected to be post-processed. The default behaviour is not to create any RT checks if a pointer with unknown bounds has been found. I think this is too strict. What if I don't want to discard recognized pointers and RT checks for them. Any idea how to implement this without duplicating the code? The default behaviour is not changed if a LAA user does not want to have information about pointers with unknown bounds. I don't how the default behaviour can be changed unintentionally. LAA was originally developed for the Loop Vectorizer. But now it is suggested to be used as the loop memory dependence framework. The problem is that the Loop Vectorizer specifics are not removed. The current users of LLA, besides the Loop Vectorizer, are LoopDistribute and LoopLoadElimination passes. They are also for vectorizing loops.

anemet added inline comments.Oct 10 2016, 3:17 PM

lib/Analysis/LoopAccessAnalysis.cpp
664	It would be good if you could describe what behavior you're trying to achieve. If I remember correctly both LDist and LLE post-process the full set of run-time checks, so you probably want something similar. (Sorry if that is what you're doing. I haven't looked at the patch because it has many unrelated changes.) LAA was originally developed for the Loop Vectorizer. But now it is suggested to be used as the loop memory dependence framework. The problem is that the Loop Vectorizer specifics are not removed. This is not accurate and it's misleading. Yes, the LV-specific APIs were retained but new generic API were developed. The LV-specific APIs are now typically formulated in terms of the generic APIs. It was done this way to allow incremental migration. If the LV-specific APIs are misleading we could move them to LV.

sbaranga added inline comments.Oct 11 2016, 2:21 AM

lib/Analysis/LoopAccessAnalysis.cpp
664	The result of this analysis is being cached (see LoopAccessLegacyAnalysis::getInfo) , so it will have whatever flags the first caller uses and subsequent calls won't recompute it. Since the other users of LAA don't know about this interface change, it will cause them to assume that the pointers with unknown bounds can be checked (even though we won't emit any checks for them). It looks like for your case you could keep the CanDoRT = false; (so there's no interface change), record the pointers that cannot be checked and not clear the RTCheck when we can't do all the checks (a few lines below). You would probably need to also clear it at some other point (maybe at the start of canCheckPtrAtRT?).

eastig added inline comments.Oct 11 2016, 3:33 AM

lib/Analysis/LoopAccessAnalysis.cpp
664	Thank you, Silviu. Now I see the issue.

Put NFC of LoopVersioningLICM to https://reviews.llvm.org/D25464

Revision Contents

Path

Size

include/

llvm/

Analysis/

LoopAccessAnalysis.h

38 lines

Transforms/

Utils/

LoopVersioning.h

10 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

20 lines

Transforms/

Scalar/

LoopVersioningLICM.cpp

472 lines

Utils/

LoopVersioning.cpp

3 lines

test/

Transforms/

LoopVersioningLICM/

copying-bytes-loop-01.ll

45 lines

copying-bytes-loop-02.ll

49 lines

loopversioningLICM1.ll

4 lines

loopversioningLICM2.ll

2 lines

Diff 73430

include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	struct PointerInfo {
PointerInfo(Value PointerValue, const SCEV Start, const SCEV *End,		PointerInfo(Value PointerValue, const SCEV Start, const SCEV *End,
bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,		bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,
const SCEV *Expr)		const SCEV *Expr)
: PointerValue(PointerValue), Start(Start), End(End),		: PointerValue(PointerValue), Start(Start), End(End),
IsWritePtr(IsWritePtr), DependencySetId(DependencySetId),		IsWritePtr(IsWritePtr), DependencySetId(DependencySetId),
AliasSetId(AliasSetId), Expr(Expr) {}		AliasSetId(AliasSetId), Expr(Expr) {}
};		};

RuntimePointerChecking(ScalarEvolution *SE) : Need(false), SE(SE) {}		RuntimePointerChecking(ScalarEvolution *SE,
		bool PartialRTCheckingAllowed = false)
		: Need(false), SE(SE),
		PartialRTChecking(PartialRTCheckingAllowed) {}

/// Reset the state of the pointer runtime information.		/// Reset the state of the pointer runtime information.
void reset() {		void reset() {
Need = false;		Need = false;
Pointers.clear();		Pointers.clear();
Checks.clear();		Checks.clear();
}		}

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	public:
/// index \p I and \p J to prove their independence.		/// index \p I and \p J to prove their independence.
bool needsChecking(unsigned I, unsigned J) const;		bool needsChecking(unsigned I, unsigned J) const;

/// \brief Return PointerInfo for pointer at index \p PtrIdx.		/// \brief Return PointerInfo for pointer at index \p PtrIdx.
const PointerInfo &getPointerInfo(unsigned PtrIdx) const {		const PointerInfo &getPointerInfo(unsigned PtrIdx) const {
return Pointers[PtrIdx];		return Pointers[PtrIdx];
}		}

		/// \brief Return true if it is allowed to have an incomplete set of runtime
		/// checks of pointers. 'incomplete set' means the set does not have checks
		/// for all pointers because bounds of some pointers can be unknown.
		bool isPartialCheckingAllowed() const {
		return PartialRTChecking;
		}

		/// \brief Notify RuntimePointerChecking that Ptr has unknown bounds.
		void pointerWithUnknownBounds(Value *Ptr) {
		assert(Ptr);
		PtrsWithUnknownBounds.push_back(Ptr);
		}

		/// \brief Get a list of pointers which have unknonw bounds.
		const SmallVectorImpl<Value *> &getPointersWithUnknownBounds() const {
		return PtrsWithUnknownBounds;
		}

private:		private:
/// \brief Groups pointers such that a single memcheck is required		/// \brief Groups pointers such that a single memcheck is required
/// between two different groups. This will clear the CheckingGroups vector		/// between two different groups. This will clear the CheckingGroups vector
/// and re-compute it. We will only group dependecies if \p UseDependencies		/// and re-compute it. We will only group dependecies if \p UseDependencies
/// is true, otherwise we will create a separate group for each pointer.		/// is true, otherwise we will create a separate group for each pointer.
void groupChecks(MemoryDepChecker::DepCandidates &DepCands,		void groupChecks(MemoryDepChecker::DepCandidates &DepCands,
bool UseDependencies);		bool UseDependencies);

/// Generate the checks and return them.		/// Generate the checks and return them.
SmallVector<PointerCheck, 4>		SmallVector<PointerCheck, 4>
generateChecks() const;		generateChecks() const;

/// Holds a pointer to the ScalarEvolution analysis.		/// Holds a pointer to the ScalarEvolution analysis.
ScalarEvolution *SE;		ScalarEvolution *SE;

/// \brief Set of run-time checks required to establish independence of		/// \brief Set of run-time checks required to establish independence of
/// otherwise may-aliasing pointers in the loop.		/// otherwise may-aliasing pointers in the loop.
SmallVector<PointerCheck, 4> Checks;		SmallVector<PointerCheck, 4> Checks;

		/// \brief Set of pointers with unknown bounds.
		SmallVector<Value *, 4> PtrsWithUnknownBounds;

		/// \brief This flag indicates if it is allowed to have incomplete set of
		/// checks which means there might be no checks for pointers with unknown
		/// memory bounds. An user of RuntimePointerChecking is responsible for
		/// creating checks for such pointers.
		bool PartialRTChecking;
};		};

/// \brief Drive the analysis of memory accesses in the loop		/// \brief Drive the analysis of memory accesses in the loop
///		///
/// This class is responsible for analyzing the memory accesses of a loop. It		/// This class is responsible for analyzing the memory accesses of a loop. It
/// collects the accesses and then its main helper the AccessAnalysis class		/// collects the accesses and then its main helper the AccessAnalysis class
/// finds and categorizes the dependences in buildDependenceSets.		/// finds and categorizes the dependences in buildDependenceSets.
///		///
Show All 10 Lines
/// ScalarEvolution, we will generate run-time checks by emitting a		/// ScalarEvolution, we will generate run-time checks by emitting a
/// SCEVUnionPredicate.		/// SCEVUnionPredicate.
///		///
/// Checks for both memory dependences and the SCEV predicates contained in the		/// Checks for both memory dependences and the SCEV predicates contained in the
/// PSE must be emitted in order for the results of this analysis to be valid.		/// PSE must be emitted in order for the results of this analysis to be valid.
class LoopAccessInfo {		class LoopAccessInfo {
public:		public:
LoopAccessInfo(Loop L, ScalarEvolution SE, const TargetLibraryInfo *TLI,		LoopAccessInfo(Loop L, ScalarEvolution SE, const TargetLibraryInfo *TLI,
AliasAnalysis AA, DominatorTree DT, LoopInfo *LI);		AliasAnalysis AA, DominatorTree DT, LoopInfo *LI,
		bool PartialRTCheckingAllowed = false);

// FIXME:		// FIXME:
// Hack for MSVC 2013 which sems like it can't synthesize this even		// Hack for MSVC 2013 which sems like it can't synthesize this even
// with default keyword:		// with default keyword:
// LoopAccessInfo(LoopAccessInfo &&LAI) = default;		// LoopAccessInfo(LoopAccessInfo &&LAI) = default;
LoopAccessInfo(LoopAccessInfo &&LAI)		LoopAccessInfo(LoopAccessInfo &&LAI)
: PSE(std::move(LAI.PSE)), PtrRtChecking(std::move(LAI.PtrRtChecking)),		: PSE(std::move(LAI.PSE)), PtrRtChecking(std::move(LAI.PtrRtChecking)),
DepChecker(std::move(LAI.DepChecker)), TheLoop(LAI.TheLoop),		DepChecker(std::move(LAI.DepChecker)), TheLoop(LAI.TheLoop),
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	public:

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

/// \brief Query the result of the loop access information for the loop \p L.		/// \brief Query the result of the loop access information for the loop \p L.
///		///
/// If there is no cached result available run the analysis.		/// If there is no cached result available run the analysis.
const LoopAccessInfo &getInfo(Loop *L);		const LoopAccessInfo &getInfo(Loop *L,
		bool PartialRTCheckingAllowed = false);

void releaseMemory() override {		void releaseMemory() override {
// Invalidate the cache when the pass is freed.		// Invalidate the cache when the pass is freed.
LoopAccessInfoMap.clear();		LoopAccessInfoMap.clear();
}		}

/// \brief Print the result of the analysis when invoked with -analyze.		/// \brief Print the result of the analysis when invoked with -analyze.
void print(raw_ostream &OS, const Module *M = nullptr) const override;		void print(raw_ostream &OS, const Module *M = nullptr) const override;
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopVersioning.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
///		///
/// This allows the loop transform pass to operate on the same loop regardless		/// This allows the loop transform pass to operate on the same loop regardless
/// of whether versioning was necessary or not:		/// of whether versioning was necessary or not:
///		///
/// for each loop L:		/// for each loop L:
/// analyze L		/// analyze L
/// if versioning is necessary version L		/// if versioning is necessary version L
/// transform L		/// transform L
void versionLoop() { versionLoop(findDefsUsedOutsideOfLoop(VersionedLoop)); }		///
		/// \return BasicBlock which contains runtime checks.
		BasicBlock *versionLoop() {
		return versionLoop(findDefsUsedOutsideOfLoop(VersionedLoop));
		}

/// \brief Same but if the client has already precomputed the set of values		/// \brief Same but if the client has already precomputed the set of values
/// used outside the loop, this API will allows passing that.		/// used outside the loop, this API will allows passing that.
void versionLoop(const SmallVectorImpl<Instruction *> &DefsUsedOutside);		///
		/// \return BasicBlock which contains runtime checks.
		BasicBlock versionLoop(const SmallVectorImpl<Instruction > &DefsUsedOutside);

/// \brief Returns the versioned loop. Control flows here if pointers in the		/// \brief Returns the versioned loop. Control flows here if pointers in the
/// loop don't alias (i.e. all memchecks passed). (This loop is actually the		/// loop don't alias (i.e. all memchecks passed). (This loop is actually the
/// same as the original loop that we got constructed with.)		/// same as the original loop that we got constructed with.)
Loop *getVersionedLoop() { return VersionedLoop; }		Loop *getVersionedLoop() { return VersionedLoop; }

/// \brief Returns the fall-back loop. Control flows here if pointers in the		/// \brief Returns the fall-back loop. Control flows here if pointers in the
/// loop may alias (i.e. one of the memchecks failed).		/// loop may alias (i.e. one of the memchecks failed).
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 652 Lines • ▼ Show 20 Lines	for (auto A : AS) {
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);		RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);

DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');
} else {		} else {
DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
		if (RtCheck.isPartialCheckingAllowed()) {
		RtCheck.pointerWithUnknownBounds(Ptr);
		} else {
CanDoRT = false;		CanDoRT = false;
		sbarangaUnsubmitted Not Done Reply Inline Actions Won't this just make canCheckPtrAtRT say that it can ignore all pointers with unknown bounds? I think this would be incorrect for all other LAA users. sbaranga: Won't this just make canCheckPtrAtRT say that it can ignore all pointers with unknown bounds? I…
		eastigAuthorUnsubmitted Not Done Reply Inline Actions They are not ignored. They are collected to be post-processed. The default behaviour is not to create any RT checks if a pointer with unknown bounds has been found. I think this is too strict. What if I don't want to discard recognized pointers and RT checks for them. Any idea how to implement this without duplicating the code? The default behaviour is not changed if a LAA user does not want to have information about pointers with unknown bounds. I don't how the default behaviour can be changed unintentionally. LAA was originally developed for the Loop Vectorizer. But now it is suggested to be used as the loop memory dependence framework. The problem is that the Loop Vectorizer specifics are not removed. The current users of LLA, besides the Loop Vectorizer, are LoopDistribute and LoopLoadElimination passes. They are also for vectorizing loops. eastig: They are not ignored. They are collected to be post-processed. The default behaviour is not to…
		anemetUnsubmitted Not Done Reply Inline Actions It would be good if you could describe what behavior you're trying to achieve. If I remember correctly both LDist and LLE post-process the full set of run-time checks, so you probably want something similar. (Sorry if that is what you're doing. I haven't looked at the patch because it has many unrelated changes.) LAA was originally developed for the Loop Vectorizer. But now it is suggested to be used as the loop memory dependence framework. The problem is that the Loop Vectorizer specifics are not removed. This is not accurate and it's misleading. Yes, the LV-specific APIs were retained but new generic API were developed. The LV-specific APIs are now typically formulated in terms of the generic APIs. It was done this way to allow incremental migration. If the LV-specific APIs are misleading we could move them to LV. anemet: It would be good if you could describe what behavior you're trying to achieve. If I remember…
		sbarangaUnsubmitted Not Done Reply Inline Actions The result of this analysis is being cached (see LoopAccessLegacyAnalysis::getInfo) , so it will have whatever flags the first caller uses and subsequent calls won't recompute it. Since the other users of LAA don't know about this interface change, it will cause them to assume that the pointers with unknown bounds can be checked (even though we won't emit any checks for them). It looks like for your case you could keep the CanDoRT = false; (so there's no interface change), record the pointers that cannot be checked and not clear the RTCheck when we can't do all the checks (a few lines below). You would probably need to also clear it at some other point (maybe at the start of canCheckPtrAtRT?). sbaranga: The result of this analysis is being cached (see LoopAccessLegacyAnalysis::getInfo) , so it…
		eastigAuthorUnsubmitted Not Done Reply Inline Actions Thank you, Silviu. Now I see the issue. eastig: Thank you, Silviu. Now I see the issue.
}		}
}		}
		}

// If we have at least two writes or one write and a read then we need to		// If we have at least two writes or one write and a read then we need to
// check them. But there is no need to checks if there is only one		// check them. But there is no need to checks if there is only one
// dependence set for this alias set.		// dependence set for this alias set.
//		//
// Note that this function computes CanDoRT and NeedRTCheck independently.		// Note that this function computes CanDoRT and NeedRTCheck independently.
// For example CanDoRT=false, NeedRTCheck=false means that we have a pointer		// For example CanDoRT=false, NeedRTCheck=false means that we have a pointer
// for which we couldn't find the bounds but we don't actually need to emit		// for which we couldn't find the bounds but we don't actually need to emit
▲ Show 20 Lines • Show All 1,278 Lines • ▼ Show 20 Lines	void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
DEBUG(dbgs() << "LAA: Found a strided access that we can version");		DEBUG(dbgs() << "LAA: Found a strided access that we can version");
DEBUG(dbgs() << " Ptr: " << Ptr << " Stride: " << Stride << "\n");		DEBUG(dbgs() << " Ptr: " << Ptr << " Stride: " << Stride << "\n");
SymbolicStrides[Ptr] = Stride;		SymbolicStrides[Ptr] = Stride;
StrideSet.insert(Stride);		StrideSet.insert(Stride);
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI)		DominatorTree DT, LoopInfo LI,
		bool PartialRTCheckingAllowed)
: PSE(llvm::make_unique<PredicatedScalarEvolution>(SE, L)),		: PSE(llvm::make_unique<PredicatedScalarEvolution>(SE, L)),
PtrRtChecking(llvm::make_unique<RuntimePointerChecking>(SE)),		PtrRtChecking(llvm::make_unique<RuntimePointerChecking>(SE, PartialRTCheckingAllowed)),
DepChecker(llvm::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),		DepChecker(llvm::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),
NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),		NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),
StoreToLoopInvariantAddress(false) {		StoreToLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(AA, LI, TLI, DT);		analyzeLoop(AA, LI, TLI, DT);
}		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
Show All 31 Lines	void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
PSE->getUnionPredicate().print(OS, Depth);		PSE->getUnionPredicate().print(OS, Depth);

OS << "\n";		OS << "\n";

OS.indent(Depth) << "Expressions re-written:\n";		OS.indent(Depth) << "Expressions re-written:\n";
PSE->print(OS, Depth);		PSE->print(OS, Depth);
}		}

const LoopAccessInfo &LoopAccessLegacyAnalysis::getInfo(Loop *L) {		const LoopAccessInfo &LoopAccessLegacyAnalysis::getInfo(Loop *L,
		bool PartialRTCheckingAllowed) {
auto &LAI = LoopAccessInfoMap[L];		auto &LAI = LoopAccessInfoMap[L];

if (!LAI)		if (!LAI) {
LAI = llvm::make_unique<LoopAccessInfo>(L, SE, TLI, AA, DT, LI);		LAI = llvm::make_unique<LoopAccessInfo>(L, SE, TLI, AA, DT, LI,
		PartialRTCheckingAllowed);
		}

return *LAI.get();		return *LAI.get();
}		}

void LoopAccessLegacyAnalysis::print(raw_ostream &OS, const Module *M) const {		void LoopAccessLegacyAnalysis::print(raw_ostream &OS, const Module *M) const {
LoopAccessLegacyAnalysis &LAA = const_cast<LoopAccessLegacyAnalysis >(this);		LoopAccessLegacyAnalysis &LAA = const_cast<LoopAccessLegacyAnalysis >(this);

for (Loop TopLevelLoop : LI)		for (Loop TopLevelLoop : LI)
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopVersioningLICM.cpp

//===----------- LoopVersioningLICM.cpp - LICM Loop Versioning ------------===//		//===----------- LoopVersioningLICM.cpp - LICM Loop Versioning ------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// When alias analysis is uncertain about the aliasing between any two accesses,		// When alias analysis is uncertain about the aliasing between any two accesses,
// it will return MayAlias. This uncertainty from alias analysis restricts LICM		// it will return MayAlias. This uncertainty from alias analysis restricts LICM
// from proceeding further. In cases where alias analysis is uncertain we might		// from proceeding further. In cases where alias analysis is uncertain we might
// use loop versioning as an alternative.		// use loop versioning as an alternative.
//		//
// Loop Versioning will create a version of the loop with aggressive aliasing		// Loop Versioning will create a version of the loop with aggressive aliasing
// assumptions in addition to the original with conservative (default) aliasing		// assumptions in addition to the original with conservative (default) aliasing
// assumptions. The version of the loop making aggressive aliasing assumptions		// assumptions. These two versions of loop will be preceded by a memory runtime
eastigAuthorUnsubmitted Not Done Reply Inline Actions We can mark only memory operation for which RT checks are created. eastig: We can mark only memory operation for which RT checks are created.
// will have all the memory accesses marked as no-alias. These two versions of		// check. This runtime check consists of bound checks for unique memory accessed
// loop will be preceded by a memory runtime check. This runtime check consists		// in loop. It ensures the lack of memory aliasing among memory operations which
// of bound checks for all unique memory accessed in loop, and it ensures the		// can be optimized by LICM. The result of the runtime check determines which of
// lack of memory aliasing. The result of the runtime check determines which of
// the loop versions is executed: If the runtime check detects any memory		// the loop versions is executed: If the runtime check detects any memory
// aliasing, then the original loop is executed. Otherwise, the version with		// aliasing, then the original loop is executed. Otherwise, the version with
// aggressive aliasing assumptions is used.		// aggressive aliasing assumptions is used.
//		//
// Following are the top level steps:		// Following are the top level steps:
//		//
// a) Perform LoopVersioningLICM's feasibility check.		// a) Perform LoopVersioningLICM's feasibility check.
// b) If loop is a candidate for versioning then create a memory bound check,		// b) If loop is a candidate for versioning then create a memory bound check,
// by considering all the memory accesses in loop body.		// by considering all the memory accesses in loop body.
// c) Clone original loop and set all memory accesses as no-alias in new loop.		// c) Clone original loop and set memory accesses having runtime checks
		// as no-alias in new loop.
// d) Set original loop & versioned loop as a branch target of the runtime check		// d) Set original loop & versioned loop as a branch target of the runtime check
// result.		// result.
//		//
// It transforms loop as shown below:		// It transforms loop as shown below:
//		//
// +----------------+		// +----------------+
// \|Runtime Memcheck\|		// \|Runtime Memcheck\|
// +----------------+		// +----------------+
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	void llvm::addStringMetadataToLoop(Loop TheLoop, const char MDString,
LLVMContext &Context = TheLoop->getHeader()->getContext();		LLVMContext &Context = TheLoop->getHeader()->getContext();
MDNode *NewLoopID = MDNode::get(Context, MDs);		MDNode *NewLoopID = MDNode::get(Context, MDs);
// Set operand 0 to refer to the loop id itself.		// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);		NewLoopID->replaceOperandWith(0, NewLoopID);
TheLoop->setLoopID(NewLoopID);		TheLoop->setLoopID(NewLoopID);
}		}

namespace {		namespace {
		/// \brief This structure keeps information about a pointer loaded from memory
		// and increased in a constant value.
		struct LoadedPointerInfo {
		/// A load instruction which loads a pointer.
		const LoadInst *Load;

		/// An increment instruction which increased the loaded pointer.
		const GetElementPtrInst *IncExpr;

		LoadedPointerInfo(const LoadInst Load, const GetElementPtrInst Expr):
		Load(Load), IncExpr(Expr) {}
		};

struct LoopVersioningLICM : public LoopPass {		struct LoopVersioningLICM : public LoopPass {
static char ID;		static char ID;

bool runOnLoop(Loop *L, LPPassManager &LPM) override;		bool runOnLoop(Loop *L, LPPassManager &LPM) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequiredID(LCSSAID);		AU.addRequiredID(LCSSAID);
AU.addRequired<LoopAccessLegacyAnalysis>();		AU.addRequired<LoopAccessLegacyAnalysis>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addRequiredID(LoopSimplifyID);		AU.addRequiredID(LoopSimplifyID);
AU.addRequired<ScalarEvolutionWrapperPass>();		AU.addRequired<ScalarEvolutionWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();
eastigAuthorUnsubmitted Not Done Reply Inline Actions TargetLibraryInfo is not used in the code. eastig: TargetLibraryInfo is not used in the code.
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}

LoopVersioningLICM()		LoopVersioningLICM()
: LoopPass(ID), AA(nullptr), SE(nullptr), LI(nullptr), DT(nullptr),		: LoopPass(ID), AA(nullptr), SE(nullptr),
TLI(nullptr), LAA(nullptr), LAI(nullptr), Changed(false),		LAA(nullptr), LAI(nullptr),
Preheader(nullptr), CurLoop(nullptr), CurAST(nullptr),		CurLoop(nullptr),
LoopDepthThreshold(LVLoopDepthThreshold),		LoopDepthThreshold(LVLoopDepthThreshold),
InvariantThreshold(LVInvarThreshold), LoadAndStoreCounter(0),		InvariantThreshold(LVInvarThreshold), LoadAndStoreCounter(0),
InvariantCounter(0), IsReadOnlyLoop(true) {		IsReadOnlyLoop(true) {
eastigAuthorUnsubmitted Not Done Reply Inline Actions Removed unused fields or made them function scoped instead of class scoped. eastig: Removed unused fields or made them function scoped instead of class scoped.
initializeLoopVersioningLICMPass(*PassRegistry::getPassRegistry());		initializeLoopVersioningLICMPass(*PassRegistry::getPassRegistry());
}		}
		const char *getPassName() const override { return "Loop Versioning for LICM"; }

		private:
		eastigAuthorUnsubmitted Not Done Reply Inline Actions Made class fields private because there is no need to have them public. eastig: Made class fields private because there is no need to have them public.
AliasAnalysis *AA; // Current AliasAnalysis information		AliasAnalysis *AA; // Current AliasAnalysis information
ScalarEvolution *SE; // Current ScalarEvolution		ScalarEvolution *SE; // Current ScalarEvolution
LoopInfo *LI; // Current LoopInfo
DominatorTree *DT; // Dominator Tree for the current Loop.
TargetLibraryInfo *TLI; // TargetLibraryInfo for constant folding.
LoopAccessLegacyAnalysis *LAA; // Current LoopAccessAnalysis		LoopAccessLegacyAnalysis *LAA; // Current LoopAccessAnalysis
const LoopAccessInfo *LAI; // Current Loop's LoopAccessInfo		const LoopAccessInfo *LAI; // Current Loop's LoopAccessInfo

bool Changed; // Set to true when we change anything.
BasicBlock *Preheader; // The preheader block of the current loop.
Loop *CurLoop; // The current loop we are working on.		Loop *CurLoop; // The current loop we are working on.
AliasSetTracker *CurAST; // AliasSet information for the current loop.		std::unique_ptr<AliasSetTracker> CurAST; // AliasSet information for the current loop.
ValueToValueMap Strides;
		// A list of pointers loaded from memory in the current loop.
		SmallVector<LoadedPointerInfo, 4> LoadedPointers;

		// A list of pointers which are invariants of the current loop.
		SmallPtrSet<Value *, 4> LoopInvPtrs;

unsigned LoopDepthThreshold; // Maximum loop nest threshold		unsigned LoopDepthThreshold; // Maximum loop nest threshold
float InvariantThreshold; // Minimum invariant threshold		float InvariantThreshold; // Minimum invariant threshold
unsigned LoadAndStoreCounter; // Counter to track num of load & store		unsigned LoadAndStoreCounter; // Counter to track num of load & store
unsigned InvariantCounter; // Counter to track num of invariant
bool IsReadOnlyLoop; // Read only loop marker.		bool IsReadOnlyLoop; // Read only loop marker.

bool isLegalForVersioning();		bool isLegalForVersioning();
bool legalLoopStructure();		bool legalLoopStructure();
bool legalLoopInstructions();		bool legalLoopInstructions();
bool legalLoopMemoryAccesses();		bool legalLoopMemoryAccesses();
bool isLoopAlreadyVisited();		bool isLoopAlreadyVisited();
void setNoAliasToLoop(Loop *);		void annotateLoopWithNoAlias(LoopVersioning &LVer);
bool instructionSafeForVersioning(Instruction *);		bool instructionSafeForVersioning(Instruction *);
const char *getPassName() const override { return "Loop Versioning"; }		bool processPointersWithUnknownBounds();
		bool canLoadedPtrHaveBounds(const LoadInst *Load);
		void addLoadedPointersRTChecks(BasicBlock *RuntimeCheckBB);
};		};
}		}

/// \brief Check loop structure and confirms it's good for LoopVersioningLICM.		/// \brief Check loop structure and confirms it's good for LoopVersioningLICM.
bool LoopVersioningLICM::legalLoopStructure() {		bool LoopVersioningLICM::legalLoopStructure() {
// Loop must have a preheader, if not return false.		// Loop must have a preheader, if not return false.
if (!CurLoop->getLoopPreheader()) {		if (!CurLoop->getLoopPreheader()) {
DEBUG(dbgs() << " loop preheader is missing\n");		DEBUG(dbgs() << " loop preheader is missing\n");
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bool LoopVersioningLICM::legalLoopStructure() {
}		}
return true;		return true;
}		}

/// \brief Check memory accesses in loop and confirms it's good for		/// \brief Check memory accesses in loop and confirms it's good for
/// LoopVersioningLICM.		/// LoopVersioningLICM.
bool LoopVersioningLICM::legalLoopMemoryAccesses() {		bool LoopVersioningLICM::legalLoopMemoryAccesses() {
bool HasMayAlias = false;		bool HasMayAlias = false;
bool TypeSafety = false;
bool HasMod = false;		bool HasMod = false;
// Memory check:		// Memory check:
// Transform phase will generate a versioned loop and also a runtime check to		// Transform phase will generate a versioned loop and also a runtime check to
// ensure the pointers are independent and they don’t alias.		// ensure the pointers are independent and they don’t alias.
// In version variant of loop, alias meta data asserts that all access are		// In version variant of loop, alias meta data asserts that all access are
// mutually independent.		// mutually independent.
//		//
// Pointers aliasing in alias domain are avoided because with multiple		// Pointers aliasing in alias domain are avoided because with multiple
// aliasing domains we may not be able to hoist potential loop invariant		// aliasing domains we may not be able to hoist potential loop invariant
// access out of the loop.		// access out of the loop.
//		//
// Iterate over alias tracker sets, and confirm AliasSets doesn't have any		// Iterate over alias tracker sets, and confirm AliasSets doesn't have any
// must alias set.		// must alias set.
for (const auto &I : *CurAST) {		for (const auto &I : *CurAST) {
const AliasSet &AS = I;		const AliasSet &AS = I;
// Skip Forward Alias Sets, as this should be ignored as part of		// Skip Forward Alias Sets, as this should be ignored as part of
// the AliasSetTracker object.		// the AliasSetTracker object.
if (AS.isForwardingAliasSet())		if (AS.isForwardingAliasSet())
continue;		continue;
// With MustAlias its not worth adding runtime bound check.		// With MustAlias its not worth adding runtime bound check.
if (AS.isMustAlias())		if (AS.isMustAlias())
return false;		return false;
Value *SomePtr = AS.begin()->getValue();
bool TypeCheck = true;
// Check for Mod & MayAlias		// Check for Mod & MayAlias
HasMayAlias \|= AS.isMayAlias();		HasMayAlias \|= AS.isMayAlias();
HasMod \|= AS.isMod();		HasMod \|= AS.isMod();
for (const auto &A : AS) {
Value *Ptr = A.getValue();
// Alias tracker should have pointers of same data type.
TypeCheck = (TypeCheck && (SomePtr->getType() == Ptr->getType()));
}
// At least one alias tracker should have pointers of same data type.
TypeSafety \|= TypeCheck;
}
// Ensure types should be of same type.
if (!TypeSafety) {
DEBUG(dbgs() << " Alias tracker type safety failed!\n");
return false;
eastigAuthorUnsubmitted Not Done Reply Inline Actions TypeCheck is needed when we with AliasSet with MustAlias. E.g. char * and char pointers may be aliased but they have different types. eastig:** TypeCheck is needed when we with AliasSet with MustAlias. E.g. char * and char ** pointers may…
}		}
// Ensure loop body shouldn't be read only.		// Ensure loop body shouldn't be read only.
if (!HasMod) {		if (!HasMod) {
DEBUG(dbgs() << " No memory modified in loop body\n");		DEBUG(dbgs() << " No memory modified in loop body\n");
return false;		return false;
}		}
// Make sure alias set has may alias case.		// Make sure alias set has may alias case.
// If there no alias memory ambiguity, return false.		// If there no alias memory ambiguity, return false.
Show All 29 Lines	if (I->mayReadFromMemory()) {
if (!Ld \|\| !Ld->isSimple()) {		if (!Ld \|\| !Ld->isSimple()) {
DEBUG(dbgs() << " Found a non-simple load.\n");		DEBUG(dbgs() << " Found a non-simple load.\n");
return false;		return false;
}		}
LoadAndStoreCounter++;		LoadAndStoreCounter++;
Value *Ptr = Ld->getPointerOperand();		Value *Ptr = Ld->getPointerOperand();
// Check loop invariant.		// Check loop invariant.
if (SE->isLoopInvariant(SE->getSCEV(Ptr), CurLoop))		if (SE->isLoopInvariant(SE->getSCEV(Ptr), CurLoop))
InvariantCounter++;		LoopInvPtrs.insert(Ptr);
}		}
// If current instruction is store instruction		// If current instruction is store instruction
// make sure it's a simple store (non atomic & non volatile)		// make sure it's a simple store (non atomic & non volatile)
else if (I->mayWriteToMemory()) {		else if (I->mayWriteToMemory()) {
StoreInst *St = dyn_cast<StoreInst>(I);		StoreInst *St = dyn_cast<StoreInst>(I);
if (!St \|\| !St->isSimple()) {		if (!St \|\| !St->isSimple()) {
DEBUG(dbgs() << " Found a non-simple store.\n");		DEBUG(dbgs() << " Found a non-simple store.\n");
return false;		return false;
}		}
LoadAndStoreCounter++;		LoadAndStoreCounter++;
Value *Ptr = St->getPointerOperand();		Value *Ptr = St->getPointerOperand();
// Check loop invariant.		// Check loop invariant.
if (SE->isLoopInvariant(SE->getSCEV(Ptr), CurLoop))		if (SE->isLoopInvariant(SE->getSCEV(Ptr), CurLoop))
InvariantCounter++;		LoopInvPtrs.insert(Ptr);

IsReadOnlyLoop = false;		IsReadOnlyLoop = false;
}		}
return true;		return true;
}		}

		/// \brief Check that other users of V other than U are only memory operations.
		static bool areOtherUsersOnlyMemOps(const Value V, const User U) {
		assert(V);
		assert(U);
		assert(!V->hasOneUse());
		for (const User *OU: V->users()) {
		if (const Instruction *I = dyn_cast<Instruction>(OU)) {
		if (!I->mayReadOrWriteMemory() && (I != U))
		return false;
		} else
		return false;
		}
		return true;
		}

		/// \brief Check if a pointer loaded by Load can have bounds.
		///
		/// Requirements:
		/// 1. Load is be simple.
		/// 2. Load pointer operand is a loop invariant.
		/// 3. The loaded pointer is increased with GetElementPtrInst.
		/// 4. Other users of the loaded pointer different from GetElementPtrInst,
		/// if any, are only memory instructions.
		/// 5. GEP has one constant index.
		/// 6. GEP has one user.
		/// 7. The user is StoreIstr.
		/// 8. Store is simple.
		/// 9. Store pointer operand is the same as Load pointer operand.
		///
		/// \param Load An instruction loading a pointer.
		/// \return true if a loaded pointer can have bounds.
		bool LoopVersioningLICM::canLoadedPtrHaveBounds(const LoadInst * Load) {
		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions This function looks for very specific pattern, need to generalize. ashutosh.nema: This function looks for very specific pattern, need to generalize.
		eastigAuthorUnsubmitted Not Done Reply Inline Actions Yes, it's currently for very a specific pattern which we've got from benchmarks and real code. To generalize the function I need to have more patterns. eastig: Yes, it's currently for very a specific pattern which we've got from benchmarks and real code.
		assert(Load);

		DEBUG(dbgs() << " Checking a pointer loaded by: " << *Load << "\n");

		if (!Load->isSimple()) {
		DEBUG(dbgs() << " Failed: load is not simple.\n");
		return false;
		}

		const Value *LoadAddr = Load->getPointerOperand();
		if (!CurLoop->isLoopInvariant(LoadAddr)) {
		DEBUG(dbgs() << " Failed: load pointer operand is not a loop invariant.\n");
		return false;
		}

		const GetElementPtrInst *GEP = nullptr;
		for (const User *U: Load->users()) {
		GEP = dyn_cast<GetElementPtrInst>(U);
		if (GEP)
		break;
		}
		if (!GEP) {
		DEBUG(dbgs() << " Failed: No GetElementPtrInst using the loaded pointer"
		<< " is found.\n");
		return false;
		}

		if (!Load->hasOneUse() && !areOtherUsersOnlyMemOps(Load, GEP)) {
		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions Why restricting to single load & its users as memory operations & GEP ? ashutosh.nema: Why restricting to single load & its users as memory operations & GEP ?
		eastigAuthorUnsubmitted Not Done Reply Inline Actions Because this is the current pattern we want to recognized. More uses might create sophisticated DFG which is not worth to analyze. Such cases should be discovered first then get analyzed. eastig: Because this is the current pattern we want to recognized. More uses might create sophisticated…
		DEBUG(dbgs() << " Failed: Non-memory operation different from GEP uses Load\n");
		return false;
		}

		if (GEP->getNumIndices() != 1) {
		DEBUG(dbgs() << " Failed: GEP has more than one index.\n");
		return false;
		}

		if (!GEP->hasAllConstantIndices()) {
		DEBUG(dbgs() << " Failed: GEP has a non-constant index.\n");
		return false;
		}

		if (!GEP->hasOneUse()) {
		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions Why one use ? ashutosh.nema: Why one use ?
		eastigAuthorUnsubmitted Not Done Reply Inline Actions The same answer as above. Lack of real use cases. eastig: The same answer as above. Lack of real use cases.
		DEBUG(dbgs() << " Failed: GEP has more than one users.\n");
		return false;
		}

		const StoreInst Store = dyn_cast<StoreInst>(GEP->user_begin());
		if (!Store) {
		DEBUG(dbgs() << " Failed: GEP user is not StoreInst.\n");
		return false;
		}

		if (!Store->isSimple()) {
		DEBUG(dbgs() << " Failed: store is not simple.\n");
		return false;
		}

		const Value *StoreAddr = Store->getPointerOperand();
		if (StoreAddr != LoadAddr) {
		DEBUG(dbgs() << " Failed: Store address is not the same as Load address.\n");
		return false;
		}

		LoadedPointers.push_back(LoadedPointerInfo(Load, GEP));

		return true;
		}

		/// \brief Process pointers with unknown bounds.
		///
		/// \return true if all pointers are loaded pointers and can have bounds.
		bool LoopVersioningLICM::processPointersWithUnknownBounds() {
		assert(CurLoop);
		assert(LAI);
		assert(LAI->getRuntimePointerChecking());

		const SmallVectorImpl<Value *> &PtrsWithUnknBnds = LAI->getRuntimePointerChecking()
		->getPointersWithUnknownBounds();
		if (PtrsWithUnknBnds.empty())
		return true;

		for (auto Ptr: PtrsWithUnknBnds) {
		const LoadInst * Load = dyn_cast<LoadInst>(Ptr);
		if (!Load) {
		DEBUG(dbgs() << " It is not a loaded a pointer: " << *Ptr << "\n");
		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions Current LoopVersioningLICM supports loop nest, i.e. it targets innermost loop. This behavior should be keep supported. ashutosh.nema: Current LoopVersioningLICM supports loop nest, i.e. it targets innermost loop. This behavior…
		eastigAuthorUnsubmitted Not Done Reply Inline Actions The old cases work as before, nothing is changed. The behaviour is only changed for the new cases. Loads are very heavy operations it is too dangerous to move them from an inner loop to an upper loop because the upper loop might have 1000 iterations and the inner loop might have 10. If there are aliased pointers a operation loading a pointer will be executed 11000 times instead of 10000 times: 1000 times in the RT checking basic block + 10000 times in the original loop. To make a proper decision an execution profile should be used. eastig: The old cases work as before, nothing is changed. The behaviour is only changed for the new…
		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions Agree it does not change the existing behavior, but for new cases why you enforcing such restriction. In your example it can be other way around as well where the inner loop has 1000 iteration and outer loop has 10 iterations and its actually beneficial to hoist load from inner load to outer lop. LoopVersioning LICM does not make any hoisting decision, actual decision & hoisting will be done later by LICM. Are you getting issues/degrades by allowing inner loops ? ashutosh.nema: Agree it does not change the existing behavior, but for new cases why you enforcing such…
		eastigAuthorUnsubmitted Done Reply Inline Actions Are you getting issues/degrades by allowing inner loops ? No, I have not seen any issues and performance regressions yet. I finally made runs when the pass was enabled by default as a part of the middle-end. Before I had manually run opt with the pass and then llc. I've got +20%...+48.5% performance improvement. This restriction is fully based on my experience of implementing different optimizations for loop nests. I will remove the restriction. eastig: > Are you getting issues/degrades by allowing inner loops ? No, I have not seen any issues and…
		return false;
		}

		if (!canLoadedPtrHaveBounds(Load))
		return false;
		}

		ashutosh.nemaUnsubmitted Not Done Reply Inline Actions Earlier collected write to memory as well, why don't want to consider them here ? ashutosh.nema: Earlier collected write to memory as well, why don't want to consider them here ?
		eastigAuthorUnsubmitted Not Done Reply Inline Actions LoopAccessAnalysis collects pointers in terms of Values. Writes are users of those pointers. So for each pointer in a loop we have an instruction defining it. eastig: LoopAccessAnalysis collects pointers in terms of Values. Writes are users of those pointers. So…
		return true;
		}

		/// \brief Add runtime checks for loaded pointers.
		///
		/// After loop versioning there will be RT checks of invariant pointers and
		/// pointers with known bounds. Other pointers are defined and used as follows:
		///
		/// Ptr = Load(InvPtr)
		/// NewPtr = GEP(Ptr, Const)
		/// Store(NewPtr, InvPtr)
		/// Mem_operations using Ptr
		///
		/// In addition to the created RT checks InvPtr needs to be checked with
		/// pointers loaded as in the case above. If loaded pointers are not aliased with
		/// invariant pointers at some iteration then at the next iteration their values
		/// are the values defined by GEP instructions.
		/// So without aliasing Ptr has values:
		///
		/// [Ptr0, Ptr0 + (number_of_iterations-1) * type_size * GEP_index], where
		/// Ptr0 is Load(InvPtr) at the first iteration
		///
		/// We need to guarantee that Ptr0 never reaches InvPtr:
		/// 4_or_8_aligned(Ptr0) != InvPtr : iteration 1
		/// 4_or_8_aligned(Ptr0 + type_size*GEP_index) != InvPtr : iteration 2
		/// 4_or_8_aligned(Ptr0 + 2type_sizeGEP_index) != InvPtr: iteration 3
		/// ...
		///
		/// Aligned Ptr0 is used because InvPtr is a pointer to a pointer and
		/// it's aligned either 4 or 8 bytes.
		///
		/// We can do more strict checking: if InvPtr is not in
		/// [4_or_8_aligned(Ptr0), Ptr0 + (number_of_iterations-1) * type_size * GEP_index]
		///
		/// 'Ptr0 + (number_of_iterations-1) * type_size * GEP_index' is used instead of
		/// the aligned version because
		/// 4_or_8_aligned(Ptr0 + (number_of_iterations-1) * type_size * GEP_index) <=
		/// Ptr0 + (number_of_iterations-1) * type_size * GEP_index
		///
		/// \param RuntimeCheckBB BasicBlock where LoopVersioning has added checks.
		void LoopVersioningLICM::addLoadedPointersRTChecks(BasicBlock *RuntimeCheckBB) {
		assert(RuntimeCheckBB);
		assert(!LoopInvPtrs.empty());
		assert(RuntimeCheckBB->getTerminator());
		assert(isa<BranchInst>(RuntimeCheckBB->getTerminator()));

		if (LoadedPointers.empty())
		return;

		const DataLayout &DL = RuntimeCheckBB->getModule()->getDataLayout();

		IRBuilder<> ChkBuilder(RuntimeCheckBB->getTerminator());

		// Get a branch condition which will be OR with additional checking results.
		Value *BranchCond = cast<BranchInst>(RuntimeCheckBB->getTerminator())
		->getCondition();

		const SCEV *CurLoopBackedgeTakenCount = SE->getBackedgeTakenCount(CurLoop);
		assert(CurLoopBackedgeTakenCount);

		// Insert calculation of BackedgeTakenCount into the basic block:
		// %0 = add i32 %num_iter, -1
		SCEVExpander Exp(*SE, RuntimeCheckBB->getModule()->getDataLayout(), "ind");
		Value *BackedgeTakenCountI32 = Exp.expandCodeFor(CurLoopBackedgeTakenCount,
		CurLoopBackedgeTakenCount->getType(), RuntimeCheckBB->getTerminator());
		Value *BackedgeTakenCountI64 = nullptr;
		if (BackedgeTakenCountI32->getType()->isIntegerTy(64)) {
		BackedgeTakenCountI64 = BackedgeTakenCountI32;
		BackedgeTakenCountI32 = nullptr;
		} else {
		BackedgeTakenCountI64 = ChkBuilder.CreateZExt(BackedgeTakenCountI32,
		Type::getInt64Ty(RuntimeCheckBB->getContext()));
		}

		for (auto &LoadedPointer: LoadedPointers) {
		// Create a copy of an instruction loading a pointer and put it into
		// the RT check basic block. This is Ptr0.
		Instruction *Load = LoadedPointer.Load->clone();
		Load->setName("ptr0");
		RuntimeCheckBB->getInstList().insert(RuntimeCheckBB->getTerminator()->getIterator(),
		Load);

		// Create 4_or_8_aligned(Ptr0):
		// %pti = ptrtoint i8* %ptr0 to i32
		// %pa = and i32 %pti, -4
		// %p0s = inttoptr i32 %pa to i8*
		Type *LoadedPtrArithTy = Type::getInt8PtrTy(RuntimeCheckBB->getContext(),
		Load->getType()->getPointerAddressSpace());
		Value *LoadedPtrStart = ChkBuilder.CreatePtrToInt(Load,
		DL.getIntPtrType(Load->getType()), "pti");
		const uint64_t AligmentMask = ~(DL.getPointerTypeSize(Load->getType()) - 1);
		LoadedPtrStart = ChkBuilder.CreateAnd(LoadedPtrStart, AligmentMask, "pa");
		LoadedPtrStart = ChkBuilder.CreateIntToPtr(LoadedPtrStart, LoadedPtrArithTy,
		"p0s");

		// Create Ptr0 + (number_of_iterations-1) * type_size * GEP_index
		// %incdec.ptr.p0e = getelementptr i8, i8* %ptr0, i32 %0
		Value *LHS = BackedgeTakenCountI64;
		Value *RHS = LoadedPointer.IncExpr->getOperand(1);
		assert(RHS->getType()->isIntegerTy(32) \|\| RHS->getType()->isIntegerTy(64));
		if (RHS->getType()->isIntegerTy(32)) {
		if (BackedgeTakenCountI32) {
		LHS = BackedgeTakenCountI32;
		} else {
		RHS = ChkBuilder.CreateZExt(RHS,
		Type::getInt64Ty(RuntimeCheckBB->getContext()));
		}
		}
		Value *Idx = ChkBuilder.CreateMul(LHS, RHS, "idx");
		Value *IdxList[1] = {Idx};
		Value *GEP = ChkBuilder.CreateGEP(Load, IdxList,
		LoadedPointer.IncExpr->getName() + ".p0e");
		assert(Load->getType()->getPointerAddressSpace()
		== GEP->getType()->getPointerAddressSpace());
		Value *LoadedPtrEnd = ChkBuilder.CreateBitCast(GEP, LoadedPtrArithTy, "p0e");

		// Create check for all invariant pointers.
		for (auto PtrInvariant: LoopInvPtrs) {
		// %invp = bitcast i8** %src to i8*
		Type *PtrInvariantArithTy = Type::getInt8PtrTy(RuntimeCheckBB->getContext(),
		PtrInvariant->getType()->getPointerAddressSpace());
		Value *PtrInvariantBC = ChkBuilder.CreateBitCast(PtrInvariant,
		PtrInvariantArithTy, "invp");

		// 4_or_8_aligned(Ptr0) <= InvPtr
		// %bound02 = icmp ule i8* %p0s, %invp
		Value *Cmp0 = ChkBuilder.CreateICmpULE(LoadedPtrStart, PtrInvariantBC,
		"bound0");

		// InvPtr <= Ptr0 + (number_of_iterations-1) * type_size * GEP_index
		// %bound13 = icmp ule i8* %invp, %incdec.ptr.p0e
		Value *Cmp1 = ChkBuilder.CreateICmpULE(PtrInvariantBC, LoadedPtrEnd,
		"bound1");

		Value *IsConflict = ChkBuilder.CreateAnd(Cmp0, Cmp1, "found.conflict");

		// Update the branch condition
		BranchCond = ChkBuilder.CreateOr(BranchCond, IsConflict, "conflict.rdx");
		}
		}
		cast<BranchInst>(RuntimeCheckBB->getTerminator())->setCondition(BranchCond);
		}

/// \brief Check loop instructions and confirms it's good for		/// \brief Check loop instructions and confirms it's good for
/// LoopVersioningLICM.		/// LoopVersioningLICM.
bool LoopVersioningLICM::legalLoopInstructions() {		bool LoopVersioningLICM::legalLoopInstructions() {
// Resetting counters.		// Resetting counters.
LoadAndStoreCounter = 0;		LoadAndStoreCounter = 0;
InvariantCounter = 0;
IsReadOnlyLoop = true;		IsReadOnlyLoop = true;
// Iterate over loop blocks and instructions of each block and check		// Iterate over loop blocks and instructions of each block and check
// instruction safety.		// instruction safety.
for (auto *Block : CurLoop->getBlocks())		for (auto *Block : CurLoop->getBlocks())
for (auto &Inst : *Block) {		for (auto &Inst : *Block) {
// If instruction is unsafe just return false.		// If instruction is unsafe just return false.
if (!instructionSafeForVersioning(&Inst))		if (!instructionSafeForVersioning(&Inst))
return false;		return false;
}		}
// Get LoopAccessInfo from current loop.		// Get LoopAccessInfo from current loop.
LAI = &LAA->getInfo(CurLoop);		const bool PartialRTCheckingAllowed = true;
		LAI = &LAA->getInfo(CurLoop, PartialRTCheckingAllowed);
// Check LoopAccessInfo for need of runtime check.		// Check LoopAccessInfo for need of runtime check.
if (LAI->getRuntimePointerChecking()->getChecks().empty()) {		if (LAI->getRuntimePointerChecking()->getChecks().empty()) {
DEBUG(dbgs() << " LAA: Runtime check not found !!\n");		DEBUG(dbgs() << " LAA: Runtime check not found !!\n");
return false;		return false;
}		}
// Number of runtime-checks should be less then RuntimeMemoryCheckThreshold		// Number of runtime-checks should be less then RuntimeMemoryCheckThreshold
if (LAI->getNumRuntimePointerChecks() >		if (LAI->getNumRuntimePointerChecks() >
VectorizerParams::RuntimeMemoryCheckThreshold) {		VectorizerParams::RuntimeMemoryCheckThreshold) {
DEBUG(dbgs() << " LAA: Runtime checks are more than threshold !!\n");		DEBUG(dbgs() << " LAA: Runtime checks are more than threshold !!\n");
return false;		return false;
}		}
// Loop should have at least one invariant load or store instruction.		// Loop should have at least one invariant load or store instruction.
		const SmallVectorImpl<Value *>::size_type InvariantCounter = LoopInvPtrs.size();
if (!InvariantCounter) {		if (!InvariantCounter) {
DEBUG(dbgs() << " Invariant not found !!\n");		DEBUG(dbgs() << " Invariant not found !!\n");
return false;		return false;
}		}
// Read only loop not allowed.		// Read only loop not allowed.
if (IsReadOnlyLoop) {		if (IsReadOnlyLoop) {
DEBUG(dbgs() << " Found a read-only loop!\n");		DEBUG(dbgs() << " Found a read-only loop!\n");
return false;		return false;
}		}
// Profitablity check:		// Profitablity check:
// Check invariant threshold, should be in limit.		// Check invariant threshold, should be in limit.
if (InvariantCounter * 100 < InvariantThreshold * LoadAndStoreCounter) {		if (InvariantCounter * 100 < InvariantThreshold * LoadAndStoreCounter) {
DEBUG(dbgs()		DEBUG(dbgs()
<< " Invariant load & store are less then defined threshold\n");		<< " Invariant load & store are less then defined threshold\n");
DEBUG(dbgs() << " Invariant loads & stores: "		DEBUG(dbgs() << " Invariant loads & stores: "
<< ((InvariantCounter * 100) / LoadAndStoreCounter) << "%\n");		<< ((InvariantCounter * 100) / LoadAndStoreCounter) << "%\n");
DEBUG(dbgs() << " Invariant loads & store threshold: "		DEBUG(dbgs() << " Invariant loads & store threshold: "
<< InvariantThreshold << "%\n");		<< InvariantThreshold << "%\n");
return false;		return false;
}		}

		if (!processPointersWithUnknownBounds()) {
		return false;
		}

return true;		return true;
}		}

/// \brief It checks loop is already visited or not.		/// \brief It checks loop is already visited or not.
/// check loop meta data, if loop revisited return true		/// check loop meta data, if loop revisited return true
/// else false.		/// else false.
bool LoopVersioningLICM::isLoopAlreadyVisited() {		bool LoopVersioningLICM::isLoopAlreadyVisited() {
// Check LoopVersioningLICM metadata into loop		// Check LoopVersioningLICM metadata into loop
Show All 33 Lines	DEBUG(dbgs()
<< " Loop memory access not suitable for LoopVersioningLICM\n\n");		<< " Loop memory access not suitable for LoopVersioningLICM\n\n");
return false;		return false;
}		}
// Loop versioning is feasible, return true.		// Loop versioning is feasible, return true.
DEBUG(dbgs() << " Loop Versioning found to be beneficial\n\n");		DEBUG(dbgs() << " Loop Versioning found to be beneficial\n\n");
return true;		return true;
}		}

/// \brief Update loop with aggressive aliasing assumptions.		/// \brief Annotate loop memory operations with aliasing information.
/// It marks no-alias to any pairs of memory operations by assuming		///
/// loop should not have any must-alias memory accesses pairs.		/// 1. Annotate operations for which LoopVersioning has created checks.
/// During LoopVersioningLICM legality we ignore loops having must		/// 2. Annotate operations for which we have created checks.
/// aliasing memory accesses.		void LoopVersioningLICM::annotateLoopWithNoAlias(LoopVersioning &LVer) {
void LoopVersioningLICM::setNoAliasToLoop(Loop *VerLoop) {		// Annotate operations for which LoopVersioning has created checks.
// Get latch terminator instruction.		LVer.annotateLoopWithNoAlias();
Instruction *I = VerLoop->getLoopLatch()->getTerminator();
// Create alias scope domain.		if (LoadedPointers.empty())
MDBuilder MDB(I->getContext());		return;
MDNode *NewDomain = MDB.createAnonymousAliasScopeDomain("LVDomain");
StringRef Name = "LVAliasScope";		// Annotate operations for which we have created checks.
SmallVector<Metadata *, 4> Scopes, NoAliases;
MDNode *NewScope = MDB.createAnonymousAliasScope(NewDomain, Name);		// Get memory operations which read from/write to invariant pointers.
// Iterate over each instruction of loop.		SmallPtrSet<Instruction *, 16> InvPtrUsers;
// set no-alias for all load & store instructions.		for (auto PtrInv: LoopInvPtrs) {
for (auto *Block : CurLoop->getBlocks()) {		for (User *U: PtrInv->users()) {
for (auto &Inst : *Block) {		if (Instruction *I = dyn_cast<Instruction>(U)) {
// Only interested in instruction that may modify or read memory.		if (I->mayReadOrWriteMemory() && CurLoop->contains(I)) {
if (!Inst.mayReadFromMemory() && !Inst.mayWriteToMemory())		Value *Ptr = nullptr;
continue;		if (LoadInst *L = dyn_cast<LoadInst>(I)) {
Scopes.push_back(NewScope);		Ptr = L->getPointerOperand();
NoAliases.push_back(NewScope);		} else if (StoreInst *S = dyn_cast<StoreInst>(I)) {
// Set no-alias for current instruction.		Ptr = S->getPointerOperand();
Inst.setMetadata(		}
LLVMContext::MD_noalias,		if (Ptr == PtrInv)
MDNode::concatenate(Inst.getMetadata(LLVMContext::MD_noalias),		InvPtrUsers.insert(I);
MDNode::get(Inst.getContext(), NoAliases)));		}
// set alias-scope for current instruction.		}
Inst.setMetadata(		}
LLVMContext::MD_alias_scope,		}
MDNode::concatenate(Inst.getMetadata(LLVMContext::MD_alias_scope),
MDNode::get(Inst.getContext(), Scopes)));		// Now set that users of invariant pointers don't alias with users of
eastigAuthorUnsubmitted Not Done Reply Inline Actions Loop Versioning provides API for annotating a loop with 'no alias' metadata. It's better not to duplicate this functionality. eastig: Loop Versioning provides API for annotating a loop with 'no alias' metadata. It's better not to…
		// loaded pointers.
		LLVMContext &Context = CurLoop->getHeader()->getContext();
		MDBuilder MDB(Context);
		MDNode *Domain = MDB.createAnonymousAliasScopeDomain("LVerLICMDomain");
		for (auto &LoadedPtr: LoadedPointers) {
		MDNode *MDN = MDB.createAnonymousAliasScope(Domain);
		for (auto &U: LoadedPtr.Load->uses()) {
		Instruction *I = dyn_cast<Instruction>(U.getUser());
		if (I && I->mayReadOrWriteMemory()) {
		I->setMetadata(LLVMContext::MD_alias_scope,
		MDNode::concatenate(I->getMetadata(LLVMContext::MD_alias_scope),
		MDNode::get(Context, MDN)));
		}
		}
		for (auto I: InvPtrUsers) {
		I->setMetadata(LLVMContext::MD_noalias,
		MDNode::concatenate(I->getMetadata(LLVMContext::MD_noalias), MDN));
}		}
}		}
}		}

bool LoopVersioningLICM::runOnLoop(Loop *L, LPPassManager &LPM) {		bool LoopVersioningLICM::runOnLoop(Loop *L, LPPassManager &LPM) {
if (skipLoop(L))		if (skipLoop(L))
return false;		return false;
Changed = false;
// Get Analysis information.		// Get Analysis information.
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
LAA = &getAnalysis<LoopAccessLegacyAnalysis>();		LAA = &getAnalysis<LoopAccessLegacyAnalysis>();
LAI = nullptr;		LAI = nullptr;
// Set Current Loop		// Set Current Loop
CurLoop = L;		CurLoop = L;
// Get the preheader block.		LoadedPointers.clear();
Preheader = L->getLoopPreheader();		LoopInvPtrs.clear();
// Initial allocation		CurAST.reset(new AliasSetTracker(*AA));
CurAST = new AliasSetTracker(*AA);
eastigAuthorUnsubmitted Not Done Reply Inline Actions This is not safe. Replaced with unique_ptr. eastig: This is not safe. Replaced with unique_ptr.

		LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
// Loop over the body of this loop, construct AST.		// Loop over the body of this loop, construct AST.
for (auto *Block : L->getBlocks()) {		for (auto *Block : L->getBlocks()) {
if (LI->getLoopFor(Block) == L) // Ignore blocks in subloop.		if (LI->getLoopFor(Block) == L) // Ignore blocks in subloop.
CurAST->add(*Block); // Incorporate the specified basic block		CurAST->add(*Block); // Incorporate the specified basic block
}		}

		bool Changed = false;

// Check feasiblity of LoopVersioningLICM.		// Check feasiblity of LoopVersioningLICM.
// If versioning found to be feasible and beneficial then proceed		// If versioning found to be feasible and beneficial then proceed
// else simply return, by cleaning up memory.		// else simply return, by cleaning up memory.
if (isLegalForVersioning()) {		if (isLegalForVersioning()) {
// Do loop versioning.		// Do loop versioning.
// Create memcheck for memory accessed inside loop.		// Create memcheck for memory accessed inside loop.
// Clone original loop, and set blocks properly.		// Clone original loop, and set blocks properly.
		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);		LoopVersioning LVer(*LAI, CurLoop, LI, DT, SE, true);
LVer.versionLoop();		BasicBlock *RTCheckingBB = LVer.versionLoop();
		addLoadedPointersRTChecks(RTCheckingBB);
// Set Loop Versioning metaData for original loop.		// Set Loop Versioning metaData for original loop.
addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getNonVersionedLoop(), LICMVersioningMetaData);
// Set Loop Versioning metaData for version loop.		// Set Loop Versioning metaData for version loop.
addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);		addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);

		if (LoadedPointers.empty()) {
// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.		// Set "llvm.mem.parallel_loop_access" metaData to versioned loop.
addStringMetadataToLoop(LVer.getVersionedLoop(),		addStringMetadataToLoop(LVer.getVersionedLoop(),
"llvm.mem.parallel_loop_access");		"llvm.mem.parallel_loop_access");
// Update version loop with aggressive aliasing assumption.		}
setNoAliasToLoop(LVer.getVersionedLoop());		annotateLoopWithNoAlias(LVer);
Changed = true;		Changed = true;
}		}
// Delete allocated memory.
delete CurAST;
return Changed;		return Changed;
}		}

char LoopVersioningLICM::ID = 0;		char LoopVersioningLICM::ID = 0;
INITIALIZE_PASS_BEGIN(LoopVersioningLICM, "loop-versioning-licm",		INITIALIZE_PASS_BEGIN(LoopVersioningLICM, "loop-versioning-licm",
"Loop Versioning For LICM", false, false)		"Loop Versioning For LICM", false, false)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
Show All 11 Lines

lib/Transforms/Utils/LoopVersioning.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	void LoopVersioning::setAliasChecks(
SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks) {		SmallVector<RuntimePointerChecking::PointerCheck, 4> Checks) {
AliasChecks = std::move(Checks);		AliasChecks = std::move(Checks);
}		}

void LoopVersioning::setSCEVChecks(SCEVUnionPredicate Check) {		void LoopVersioning::setSCEVChecks(SCEVUnionPredicate Check) {
Preds = std::move(Check);		Preds = std::move(Check);
}		}

void LoopVersioning::versionLoop(		BasicBlock* LoopVersioning::versionLoop(
const SmallVectorImpl<Instruction *> &DefsUsedOutside) {		const SmallVectorImpl<Instruction *> &DefsUsedOutside) {
Instruction *FirstCheckInst;		Instruction *FirstCheckInst;
Instruction *MemRuntimeCheck;		Instruction *MemRuntimeCheck;
Value *SCEVRuntimeCheck;		Value *SCEVRuntimeCheck;
Value *RuntimeCheck = nullptr;		Value *RuntimeCheck = nullptr;

// Add the memcheck in the original preheader (this is empty initially).		// Add the memcheck in the original preheader (this is empty initially).
BasicBlock *RuntimeCheckBB = VersionedLoop->getLoopPreheader();		BasicBlock *RuntimeCheckBB = VersionedLoop->getLoopPreheader();
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	BasicBlock* LoopVersioning::versionLoop(

// The loops merge in the original exit block. This is now dominated by the		// The loops merge in the original exit block. This is now dominated by the
// memchecking block.		// memchecking block.
DT->changeImmediateDominator(VersionedLoop->getExitBlock(), RuntimeCheckBB);		DT->changeImmediateDominator(VersionedLoop->getExitBlock(), RuntimeCheckBB);

// Adds the necessary PHI nodes for the versioned loops based on the		// Adds the necessary PHI nodes for the versioned loops based on the
// loop-defined values used outside of the loop.		// loop-defined values used outside of the loop.
addPHINodes(DefsUsedOutside);		addPHINodes(DefsUsedOutside);
		return RuntimeCheckBB;
}		}

void LoopVersioning::addPHINodes(		void LoopVersioning::addPHINodes(
const SmallVectorImpl<Instruction *> &DefsUsedOutside) {		const SmallVectorImpl<Instruction *> &DefsUsedOutside) {
BasicBlock *PHIBlock = VersionedLoop->getExitBlock();		BasicBlock *PHIBlock = VersionedLoop->getExitBlock();
assert(PHIBlock && "No single successor to loop exit block");		assert(PHIBlock && "No single successor to loop exit block");
PHINode *PN;		PHINode *PN;

▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

test/Transforms/LoopVersioningLICM/copying-bytes-loop-01.ll

This file was added.

				; RUN: opt < %s -S -loop-versioning-licm -scoped-noalias -licm 2>&1 \| FileCheck %s

				; CHECK: while.body.lver.check:{{.*}}
				; CHECK: %ptr0 = load i8{{.*}}
				; CHECK: {{.}}p0e = getelementptr i8{{.}}
				; CHECK: {{.}}= icmp ule i8{{.*}}
				; CHECK: {{.}}= icmp ule i8{{.*}}
				; CHECK: while.body.ph.lver.orig:{{.*}}
				; CHECK: while.body.lver.orig:{{.*}}
				; CHECK: while.body.ph:{{.*}}
				; CHECK-NEXT: {{.}}= load i8{{.*}}
				; CHECK: while.body:{{.*}}
				; CHECK: while.end.loopexit.loopexit:{{.*}}
				; CHECK: store i8{{.}}

				define void @g(i8* nocapture %dst, i8** nocapture %src) nounwind {
				entry:
				%call = tail call i32 @f()
				%tobool3 = icmp eq i32 %call, 0
				br i1 %tobool3, label %while.end, label %while.body.preheader

				while.body.preheader: ; preds = %entry
				br label %while.body

				while.body: ; preds = %while.body.preheader, %while.body
				%bytes_to_read.05 = phi i32 [ %dec, %while.body ], [ %call, %while.body.preheader ]
				%dst.addr.04 = phi i8* [ %incdec.ptr1, %while.body ], [ %dst, %while.body.preheader ]
				%dec = add nsw i32 %bytes_to_read.05, -1
				%0 = load i8, i8* %src, align 4
				%incdec.ptr = getelementptr inbounds i8, i8* %0, i32 1
				store i8* %incdec.ptr, i8** %src, align 4
				%1 = load i8, i8* %0, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %dst.addr.04, i32 1
				store i8 %1, i8* %dst.addr.04, align 1
				%tobool = icmp eq i32 %dec, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				declare i32 @f()

test/Transforms/LoopVersioningLICM/copying-bytes-loop-02.ll

This file was added.

				; RUN: opt < %s -S -loop-versioning-licm -scoped-noalias -licm 2>&1 \| FileCheck %s

				; CHECK: while.body.lver.check:{{.*}}
				; CHECK: %ptr0 = load i8{{.*}}
				; CHECK: {{.}}p0e = getelementptr i8{{.}}
				; CHECK: {{.}}= icmp ule i8{{.*}}
				; CHECK: {{.}}= icmp ule i8{{.*}}
				; CHECK: while.body.ph.lver.orig:{{.*}}
				; CHECK: while.body.lver.orig:{{.*}}
				; CHECK: while.body.ph:{{.*}}
				; CHECK-NEXT: {{.}}= load i8{{.*}}
				; CHECK: while.body:{{.*}}
				; CHECK: while.end.loopexit.loopexit:{{.*}}
				; CHECK: store i8{{.}}

				%struct.S = type { i8, i8, i8* }

				define void @g(%struct.S* nocapture %buf, i8* nocapture readonly %src) nounwind {
				entry:
				%call = tail call i32 @f()
				%tobool3 = icmp eq i32 %call, 0
				br i1 %tobool3, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				%pos = getelementptr inbounds %struct.S, %struct.S* %buf, i32 0, i32 1
				br label %while.body

				while.body: ; preds = %while.body, %while.body.lr.ph
				%bytes_to_read.05 = phi i32 [ %call, %while.body.lr.ph ], [ %dec, %while.body ]
				%src.addr.04 = phi i8* [ %src, %while.body.lr.ph ], [ %incdec.ptr, %while.body ]
				%dec = add nsw i32 %bytes_to_read.05, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %src.addr.04, i32 1
				%0 = load i8, i8* %src.addr.04, align 1
				%1 = load i8, i8* %pos, align 4
				%incdec.ptr1 = getelementptr inbounds i8, i8* %1, i32 1
				store i8* %incdec.ptr1, i8** %pos, align 4
				store i8 %0, i8* %1, align 1
				%tobool = icmp eq i32 %dec, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				declare i32 @f()

test/Transforms/LoopVersioningLICM/loopversioningLICM1.ll

	; RUN: opt < %s -O1 -S -loop-versioning-licm -licm -debug-only=loop-versioning-licm 2>&1 \| FileCheck %s			; RUN: opt < %s -O1 -S -loop-versioning-licm -licm -debug-only=loop-versioning-licm 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts
	;			;
	; Test to confirm loop is a candidate for LoopVersioningLICM.			; Test to confirm loop is a candidate for LoopVersioningLICM.
	; It also confirms invariant moved out of loop.			; It also confirms invariant moved out of loop.
	;			;
	; CHECK: Loop: Loop at depth 2 containing: %for.body3<header><latch><exiting>			; CHECK: Loop: Loop at depth 2 containing: %for.body3<header><latch><exiting>
	; CHECK-NEXT: Loop Versioning found to be beneficial			; CHECK-NEXT: Loop Versioning found to be beneficial
	;			;
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: %[[induction:.*]] = phi i32 [ %arrayidx7.promoted, %for.body3.ph ], [ %add8, %for.body3 ]			; CHECK-NEXT: %[[induction:.*]] = phi i32 [ %arrayidx7.promoted, %for.body3.ph ], [ %add8, %for.body3 ]
	; CHECK-NEXT: %j.113 = phi i32 [ %j.016, %for.body3.ph ], [ %inc, %for.body3 ]			; CHECK-NEXT: %j.113 = phi i32 [ %j.016, %for.body3.ph ], [ %inc, %for.body3 ]
	; CHECK-NEXT: %idxprom = zext i32 %j.113 to i64			; CHECK-NEXT: %idxprom = zext i32 %j.113 to i64
	; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, i32* %var1, i64 %idxprom			; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, i32* %var1, i64 %idxprom
	; CHECK-NEXT: store i32 %add, i32* %arrayidx, align 4, !alias.scope !2, !noalias !2			; CHECK-NEXT: store i32 %add, i32* %arrayidx, align 4, !alias.scope !{{[0-9]+}}
	eastigAuthorUnsubmitted Not Done Reply Inline Actions Meaningless metadata. The instruction is not aliased with itself by default. Also it's better not to hard-coded ids as they can be changed. eastig: Meaningless metadata. The instruction is not aliased with itself by default. Also it's better…
	; CHECK-NEXT: %add8 = add nsw i32 %[[induction]], %add			; CHECK-NEXT: %add8 = add nsw i32 %[[induction]], %add
	; CHECK-NEXT: %inc = add nuw i32 %j.113, 1			; CHECK-NEXT: %inc = add nuw i32 %j.113, 1
	; CHECK-NEXT: %cmp2 = icmp ult i32 %inc, %itr			; CHECK-NEXT: %cmp2 = icmp ult i32 %inc, %itr
	; CHECK-NEXT: br i1 %cmp2, label %for.body3, label %for.inc11.loopexit.loopexit6, !llvm.loop !5			; CHECK-NEXT: br i1 %cmp2, label %for.body3, label %for.inc11.loopexit.loopexit6, !llvm.loop !{{[0-9]+}}
	define i32 @foo(i32* nocapture %var1, i32* nocapture readnone %var2, i32* nocapture %var3, i32 %itr) #0 {			define i32 @foo(i32* nocapture %var1, i32* nocapture readnone %var2, i32* nocapture %var3, i32 %itr) #0 {
	entry:			entry:
	%cmp14 = icmp eq i32 %itr, 0			%cmp14 = icmp eq i32 %itr, 0
	br i1 %cmp14, label %for.end13, label %for.cond1.preheader.preheader			br i1 %cmp14, label %for.end13, label %for.cond1.preheader.preheader

	for.cond1.preheader.preheader: ; preds = %entry			for.cond1.preheader.preheader: ; preds = %entry
	br label %for.cond1.preheader			br label %for.cond1.preheader

	Show All 40 Lines

test/Transforms/LoopVersioningLICM/loopversioningLICM2.ll

	; RUN: opt < %s -O1 -S -loop-versioning-licm -licm -debug-only=loop-versioning-licm -disable-loop-unrolling 2>&1 \| FileCheck %s			; RUN: opt < %s -O1 -S -loop-versioning-licm -licm -debug-only=loop-versioning-licm -disable-loop-unrolling 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts
	;			;
	; Test to confirm loop is a good candidate for LoopVersioningLICM			; Test to confirm loop is a good candidate for LoopVersioningLICM
	; It also confirms invariant moved out of loop.			; It also confirms invariant moved out of loop.
	;			;
	; CHECK: Loop: Loop at depth 2 containing: %for.body3.us<header><latch><exiting>			; CHECK: Loop: Loop at depth 2 containing: %for.body3.us<header><latch><exiting>
	; CHECK-NEXT: Loop Versioning found to be beneficial			; CHECK-NEXT: Loop Versioning found to be beneficial
	;			;
	; CHECK: for.cond1.for.inc17_crit_edge.us.loopexit5: ; preds = %for.body3.us			; CHECK: for.cond1.for.inc17_crit_edge.us.loopexit5: ; preds = %for.body3.us
	; CHECK-NEXT: %add14.us.lcssa = phi float [ %add14.us, %for.body3.us ]			; CHECK-NEXT: %add14.us.lcssa = phi float [ %add14.us, %for.body3.us ]
	; CHECK-NEXT: store float %add14.us.lcssa, float* %arrayidx.us, align 4, !alias.scope !0, !noalias !0			; CHECK-NEXT: store float %add14.us.lcssa, float* %arrayidx.us, align 4, !alias.scope !{{[0-9]+}}
	; CHECK-NEXT: br label %for.cond1.for.inc17_crit_edge.us			; CHECK-NEXT: br label %for.cond1.for.inc17_crit_edge.us
	;			;
	define i32 @foo(float* nocapture %var2, float** nocapture readonly %var3, i32 %itr) #0 {			define i32 @foo(float* nocapture %var2, float** nocapture readonly %var3, i32 %itr) #0 {
	entry:			entry:
	%cmp38 = icmp sgt i32 %itr, 1			%cmp38 = icmp sgt i32 %itr, 1
	br i1 %cmp38, label %for.body3.lr.ph.us, label %for.end19			br i1 %cmp38, label %for.body3.lr.ph.us, label %for.end19

	for.body3.us: ; preds = %for.body3.us, %for.body3.lr.ph.us			for.body3.us: ; preds = %for.body3.us, %for.body3.lr.ph.us
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Add support of a new optimization case to Loop Versioning for LICM + code clean upNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 73430

include/llvm/Analysis/LoopAccessAnalysis.h

include/llvm/Transforms/Utils/LoopVersioning.h

lib/Analysis/LoopAccessAnalysis.cpp

lib/Transforms/Scalar/LoopVersioningLICM.cpp

lib/Transforms/Utils/LoopVersioning.cpp

test/Transforms/LoopVersioningLICM/copying-bytes-loop-01.ll

test/Transforms/LoopVersioningLICM/copying-bytes-loop-02.ll

test/Transforms/LoopVersioningLICM/loopversioningLICM1.ll

test/Transforms/LoopVersioningLICM/loopversioningLICM2.ll

[LICM] Add support of a new optimization case to Loop Versioning for LICM + code clean up
Needs ReviewPublic