This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopEditor.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
LoopEditor.cpp

Differential D11530

RFC: LoopEditor, a high-level loop transform toolkit
Needs ReviewPublic

Authored by jmolloy on Jul 27 2015, 10:03 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
aschwaighofer
hfinkel

Summary

This is a prototype API designed to perform or to enable high-level loop transformations. It is my hope that it could be used throughout LLVM's vectorizers and loop transforms as the go-to API for editing loops.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 30702.Jul 27 2015, 10:03 AM

jmolloy retitled this revision from to RFC: LoopEditor, a high-level loop transform toolkit.

jmolloy updated this object.

jmolloy set the repository for this revision to rL LLVM.

Hi Adam, Hal, Michael, Nadav, Arnold,

Thanks for your comments on my RFC thread.

This latest diff is actually for you to review; I have three things here:

The latest incarnation of the LoopEditor header/API.
A stubbed out .cpp so it all compiles and links.
A proof-of-concept showing the LoopEditor applied to the LoopVectorizer and replacing createEmptyBlock().

Only (1) and (2) am I looking to commit; (3) is here as an example of how the LoopEditor might fit. I should mention that (3) looks horrible, and rightly so. There would need to be a bunch of refactoring, and for this proof of concept I've tried to keep the LoC count down. But I would, in the end:

Move almost all the InnerLoopVectorizer code into the delegate, or more cleanly make InnerLoopVectorizer a subclass of Delegate.
Start by just using LoopEditor to replace createEmptyBlock and let LoopVectorizer do most of the cloning itself
Then start to de-duplicate a lot of the cloning functionality - removing the unroll factor completely from the LoopVectorizer class as a start (LoopEditor can do all that itself), then moving from LoopVectorizer managing its own PHIs to LoopEditor doing that for it.
By the end, hopefully having something substantially cleaner with a very distinct and testable API.

Specific improvements in the API since my RFC:

Add the minimal hook functions that the LoopVectorizer needs to the delegate.
Add the concept of an "AnalysisLevel" - different mutations or analyses on a loop may depend on being able to analyze more or less information about the loop. For example, loop cloning (as used by loop distribution) doesn't need to identify all recurrences, but loop vectorization does.
Revamp Reductions and Inductions and how they're represented, with a proper class hierarchy.
Streamline the API a tad.

Please let me know your thoughts!

James

jmolloy added reviewers: anemet, aschwaighofer, mzolotukhin, hfinkel, reames.Aug 4 2015, 7:17 AM

jmolloy added a subscriber: llvm-commits.

Hi all,

Does anyone have any thoughts on the API design? I'm keen to get cracking on this.

Cheers,

James

Hi James,

I've started looking at this and should get back to you today or tomorrow.

Adam

I think this looks quite useful.

One thing that is not obvious to me is what the LoopEditor will do for loops that are not inner loops. Are any of these operations inner-loop only?

Also, how will if-conversion (during vectorization, etc.) work?

James,

I haven't looked at the patch, just getting my bearings...

Is this just for the vectorizer, or are you planning to extend this to *any* loop transformation?

We have some shared methods in lib/Transforms/Utils/LoopUtils.cpp, which you don't seem to touch. Are you planning on joining all loop transformations into one tool-kit, then use for all vectorizer, unroller, simplify, rotation, etc.?

cheers,
--renato

I'm failing to see (partially because it's my first time looking at it) how your new hooks will make sure all the pre-analysis is done before the actual vectorization. But it seems Adam and others are more familiar with it, so I'll just let them comment about that.

Having said that, I welcome any change that makes it easier to share code, and it now seems to me that other loop passes will be able to use the same hooks and reduce their complexity as well as you did in the vectorizer.

cheers,
--renato

lib/Target/ARM/Thumb2ITBlockPass.cpp
186 ↗	(On Diff #31326)	oops

Hi James,

Now, I read the whole patch and I guess I am still more-or-less left with my original questions from the RFC thread.

Can we further decompose the functionality beyond LoopEditor? One straw-man I was thinking it to focus on the widening functionality. Both interleaving and loop-vectorization could be considered widening differing only through the means they achieve widening (by vectorizing or duplicating instructions). Then obviously the hook could provide the instruction-specific widening operation and how reduction values should be recovered at the exit of the loop.

I would also encourage you to use LoopVersioning. The idea was certainly to convert the vectorizer to be another client of LVer. If you need to add hooks or what not feel free. The plan for LoopVersioning has always been to go beyond just alias-checks (we have immediate plans to add dynamic loop-trip count checks) so having to reimplement this at various places would be a mistake now that I finally refactored it.

I would also think that it would be a design mistake to re-implement LoopVersioning on top of something more complex, i.e. LoopEditor. I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses. (Having AnalysisLevel in the class is an indication to me the class is not properly decomposed.)

It also unclear to me why want to pin down the API by committing it with the implementation stubbed out. I think it's good to discuss the end goal but why can't we invent and refine the API gradually as you start refactoring and transitioning over the existing functionality?

Thanks,
Adam

In D11530#220815, @anemet wrote:

I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses.

I agree with this statement. Having multiple, independent, focused tools, used by multiple, independent passes is a better design.

It may, however, need some redundant information about the state of things on each tool / user, but I think we can manage it.

cheers,
--renato

Hi Adam, Renato,

Thanks for your review and sorry for the time it took to get back to you.

I think I agree with all your comments. The LoopEditor structure is a bit of a monolith - I hadn't really noticed as I was designing it. I like your idea of composable operations more (and it shows that I should re-read a design patterns book sometime soon!). I confess that I did see LoopVersioning as a trivial user of LoopEditor, but I do understand your reservations here.

I'm also not pushing for the entire API to pushed in in one go - it was just, as you said, describing the end goal.

So it looks like this becomes:

Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).
[My own requirement] Make sure loopversioning works with non-leaf loops.
Create a new class LoopWidening.
- I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".
[Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

The names are up for bikeshedding.

This seems to make review much easier. What do you think of this as a plan?

Cheers,

James

Hi James,

Thanks very much for considering my comments. I agree with this direction overall. I have a few specific comments below:

So it looks like this becomes:

Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).

Agreed. Silviu's Assumption-based SCEV is already proposing to use LoopVer to host overflow checks. I also have plans to add dynamic trip-count checks to allow a higher number of checks when we know the higher trip count will justify the the additional overhead.

[My own requirement] Make sure loopversioning works with non-leaf loops.

Makes sense.

Create a new class LoopWidening.

I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".

There are certainly commonalities between interleaving and vectorization that we may need to be able leverage by delegating the differences to hooks. I guess the way this shapes up will depend on the specifics of how the code will be split out from the Loop Vectorizer. It will initially be probably pretty close to structure of code in the vectorizer.

[Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

I am not sure there is much code to be shared between LoopPeeling and Widening. Why do you think so?

The names are up for bikeshedding.

Just for the record, I added the 'ing' ending to LoopVersioning to stress that I mean "version", the verb rather than the noun.

Thanks again,
Adam

Hi Adam,

I'm glad we're converging on a way forward!

I am not sure there is much code to be shared between LoopPeeling and

Widening. Why do you think so?

Commonalities would be cloning all the blocks in a loop, detecting and
hooking up reductions/inductions, and modifying the loop trip count. But
you're right, "on top of" was not the right thing to say.

Cheers,

James

msg-31049-64.txt162 BDownload

Hi Adam,

I've been working on this for the past few days and wanted to give an update. The most interesting/annoying thing about this all is exactly how to split out/share code with the loopvectorizer.

My ideal would be to take the LoopVectorizer code and refactor it piece by piece until it's in a modular enough shape to encapsulate in a utility class. This would mean we'd maintain 100% test coverage throughout the process.

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

This has meant that everything we do in the loop vectorizer happens in one place. If-conversion is done on-the-fly, as are induction PHI updates and remaps. Because of this it's very difficult to isolate one piece of functionality and refactor it - it's all or nothing.

So what I'd like to do is to incrementally create the helper classes LoopWidening/LoopInterleaving/LoopVectorizing separately. In order to get test coverage and move towards an end goal, I'd hook the new helper classes into LoopVectorize and make LoopVectorize use the new classes when possible. Something like this:

 if (!VectorizeLoop) {
  assert(IC > 1 && "interleave count should not be 1 or 0");
  // If we decided that it is not legal to vectorize the loop then
  // interleave it.
  if (canBeHandledByLoopInterleaving()) {
    LoopVersioning LVer(L);
    LVer.versionLoop();
    LoopInterleaving LInt(LVer.getVersionedLoop());
    LInt.widenLoop();
  } else {
    InnerLoopUnroller Unroller(L, SE, LI, DT, TLI, TTI, IC);
    Unroller.vectorize(&LVL);
  }
  emitOptimizationRemark(F->getContext(), DEBUG_TYPE, *F, L->getStartLoc(),
                         Twine("interleaved loop (interleaved count: ") +
                             Twine(IC) + ")");
} else {
  ...

Initially the new codepath would be taken only when the loop is very simple; it has only a single block, for example (or a debug option forced it).

I'm still not 100% happy that I haven't found a way to refactor the code already there :( What do you think? I don't want to spend a massive amount of time going in the wrong direction.

Cheers,

James

Hi James,

Thanks for pushing it forward, I think we really need cleaning this up!

I don't think we'd like to have two independent vectorizers at the same time: 1) the old one, 2) based on the LoopEditor framework and invoked on simple loops (for beginning). The problem with that approach is that you're actually rewriting vectorizer from scratch, which should be really-really last resort. While I do admit that the code might be not ideal in places, I don't think we need to completely rewrite it from scratch. We can't be sure if they will converge in some observable future, and in attempts to converge them we don't know how the LoopEditor implementation will evolve when it covers all corner cases that the existing vectorizer supports.

So, I think the best way is to actually refactor existing code in small incremental steps (and by small I don't mean number of lines changed - I mean number of key points changed). For instance, you introduced a new class Recurrence - why not to do that in the original code? This way the existing code would become closer to what you want in the end, thus your patch will become smaller. I understand that it's not as easy as it sounds, but I think that's the proper way of pulling such changes. Another benefit of such approach is that it would be much easier to review, since NFC changes would be separated from new features, and thus we can both better check NFC changes for possible bugs and discuss the features on a higher level.

That said, in the end I do like to see API similar to what you proposed. I just think that growing it in parallel might back-fire.

Thanks,
Michael

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

Can you please elaborate on the difficulties a bit more? Perhaps working through an example would help here.

Thanks,
Adam

Hi Adam, Michael,

On Friday I tried again, intending on reporting to you exactly why I failed again... and succeeded :)

I've now got a bunch of patches, culminating in changing LoopVectorize to use LoopVersioning, and I intend to refactor a bunch more too. The patches I've got up for review currently are:

http://reviews.llvm.org/D12284
http://reviews.llvm.org/D12285
http://reviews.llvm.org/D12286
http://reviews.llvm.org/D12289

Those are the ones that aren't quite NFC. I've got a bunch more that just does code cleanup (enabled by those patches) that are trivial.

I still need to throw more testing at these patches, but they all pass regression tests at least.

Cheers,

James

reames resigned from this revision.Oct 8 2015, 10:29 AM

reames removed a reviewer: reames.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopEditor.h

379 lines

lib/

Transforms/

Utils/

LoopEditor.cpp

1003 lines

Diff 30702

include/llvm/Transforms/Utils/LoopEditor.h

This file was added.

				//===-- LoopEditor.h - High-level loop transformations --------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// The LoopEditor provides a toolkit for performing high-level transforms on
				// loops.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_UTILS_LOOPEDITOR_H
				#define LLVM_TRANSFORMS_SCALAR_UTILS_LOOPEDITOR_H

				#include <llvm/ADT/ArrayRef.h>
				#include <llvm/ADT/DenseMap.h>
				#include <llvm/ADT/SmallVector.h>
				#include <llvm/ADT/Twine.h>
				#include <llvm/IR/IRBuilder.h>
				#include <llvm/Transforms/Utils/ValueMapper.h>
				#include <functional>
				#include <map>

				namespace llvm {

				class BasicBlock;
				class DataLayout;
				class DominatorTree;
				class Loop;
				class LoopInfo;
				class PHINode;
				class SCEV;
				class ScalarEvolution;
				class Value;

				/// \brief This class provides a toolkit to create high-level operations on
				/// loops.
				///
				/// The functionality consists of a set of core functions which are designed
				/// to be composable, and a set of helper routines that stitch together
				/// the core functions to provide higher-level constructions such as loop
				/// versioning, or loop peeling.
				///
				/// Many functions can take a Delegate object, which allows the user to be
				/// notified when different events happen (such as for example "an instruction
				/// has been interleaved"). This allows the core functionality to remain
				/// simple and not need too much configuration, while allowing the user
				/// the flexibility to hook into the internals. Some delegate methods offer
				/// the user the chance to influence the edit operation.
				///
				/// All of the core functions are designed to keep LoopInfo, ScalarEvolution
				/// and DominatorTree up to date.
				class LoopEditor {
				/// Internal class representing a PHI node inside the header block of the
				/// loop.
				class PHIID {
				protected:
				/// Default constructor purely for operator[] on maps.
				PHIID() {}
				PHIID(PHINode BO, Value StartV, PHINode *Val) :
				BasedOn(BO), StartValue(StartV), Val(Val) {
				}
				/// If a loop is cloned, its clone will have this member of each of its
				/// recurrences and inductions set to the original recurrence or induction.
				///
				/// This allows LoopEditor to reason about sets of PHIs and how to merge
				/// them together when stitching the outputs of loops to the input of other
				/// loops (such as in LoopEditor::addIncoming).
				PHINode *BasedOn;
				/// For an induction or a reduction, the "start value" of the PHI (the value
				/// incoming from the preheader).
				Value *StartValue;
				/// The PHInode itself.
				PHINode *Val;
				// This class is deliberately opaque to users, but allow LoopEditor full
				// access.
				friend class LoopEditor;
				};
				public:
				/// A semi-opaque descriptor for a reduction.
				class Reduction : private PHIID {
				// FIXME: Add recurrence here.
				Reduction(Value StartV, PHINode Val) :
				PHIID(nullptr, StartV, Val) {
				}
				public:
				/// Only available for operator[] on maps.
				Reduction() { assert(0 && "Unreachable!"); }
				bool operator < (const Reduction &R) const {
				return Val < R.Val;
				}
				/// Return the start value of this reduction.
				Value *getStartValue() const { return StartValue; }
				/// Return the PHI in the loop header that identifies this reduction.
				PHINode *getPHI() const { return Val; }
				/// Emit code to reduce Op1 and Op2 to one value.
				Value createOp(IRBuilder<> &IRB, Value Op1, Value *Op2) const;
				// Open up to LoopEditor.
				friend class LoopEditor;
				};
				/// A semi-opaque descriptor for an induction.
				class Induction : private PHIID {
				Induction(Value StartV, PHINode Val) :
				PHIID(nullptr, StartV, Val) {
				}
				public:
				/// Only available for operator[] on maps.
				Induction() { assert(0 && "Unreachable!"); }
				bool operator < (const Induction &R) const {
				return Val < R.Val;
				}
				/// Return the start value of this induction.
				Value *getStartValue() const { return StartValue; }
				/// Return the PHI in the loop header that identifies this induction.
				PHINode *getPHI() const { return Val; }
				// Open up to LoopEditor.
				friend class LoopEditor;
				};

				private:
				//
				// Immutable/analysis state
				//
				Loop *L;
				ScalarEvolution *SE;
				const DataLayout *DL;
				LoopInfo *LI;
				DominatorTree *DT;
				bool Analyzed;

				//
				// Mutable, cached state
				//
				SmallVector<Reduction,8> Reductions;
				DenseMap<Value*,Reduction> ReductionsByValue;
				SmallVector<Induction,8> Inductions;
				DenseMap<Value*,Induction> InductionsByValue;
				const SCEV *TripCountSCEV;
				Value *BackedgeCountComparisonValue;
				Instruction *BackedgeCountComparison;
				BasicBlock *Bypass;
				Value *ExecutedTripCountValue;

				void ensureStartValuesAvailable();
				void ensureCanonicalInductionVariable();
				void ensureBypassCreated();
				void analyze();
				void addStartPHIIncoming(PHINode PN, Value V, BasicBlock *BB);
				bool sameProvenance(const PHIID &A, const PHIID &B);
				void makeValuesAvailableInBlocks(ArrayRef<Value*> Values,
				ArrayRef<BasicBlock*> Blocks);
				PHINode *getCanonicalInductionVariable();
				void verify(const char *Txt);
				void computeBackedgeCountComparison();
				Value *getStartingValue();
				Value determineStartValueForPHI(PHINode PN);
				void removePHINodeIncomingValues(BasicBlock In, BasicBlock For);
				void replacePHINodeIncomingValues(BasicBlock In, BasicBlock For,
				BasicBlock *With);
				BasicBlock *getDedicatedExitingBlock();
				BasicBlock computeImmediateDominator(BasicBlock B);
				LLVMContext &getContext();

				/// Hidden constructor, only used by LoopEditor::clone() to initialize
				/// the Bypass block.
				LoopEditor(Loop L, BasicBlock Bypass, ScalarEvolution *SE,
				const DataLayout DL, LoopInfo LI, DominatorTree *DT);
				public:
				/// Delegates should be subclassed by users. They can optionally chain
				/// to other delegates for composability.
				///
				/// Delegates provide many callback functions, but only a subset if any
				/// will be invoked for any one core function call.
				///
				/// It is forbidden to modify the content of the loop when servicing one
				/// of the "notify*" callbacks.
				struct Delegate {
				Delegate *Next;

				Delegate() : Next(nullptr) {}
				Delegate(Delegate *Next) : Next(Next) {}
				virtual void anchor(); // Provide a home for the vtable.

				/// A call to interleave() is about to happen, for the given iteration.
				/// Iteration starts at 1 (the zero'th iteration would correspond to
				/// the original loop content).
				///
				/// Called by: widenAndInterleave()
				virtual void notifyInterleaveIterationStarting(unsigned Iteration) {
				if (Next) Next->notifyInterleaveIterationStarting(Iteration);
				}
				/// \c OldInst has been cloned to \c NewInst, during a call to
				/// \c interleave()
				///
				/// Called by: interleave()
				virtual void notifyInstructionInterleaved(Instruction *OldInst,
				Instruction *NewInst) {
				if (Next) Next->notifyInstructionInterleaved(OldInst, NewInst);
				}
				/// \c OldInst has been peeled out of the loop as \c NewInst.
				/// The peel iteration was \c Iteration, which starts from zero.
				///
				/// Called by: peelAfter()
				virtual void notifyInstructionPeeled(Instruction *OldInst,
				Instruction *NewInst,
				unsigned Iteration) {
				if (Next) Next->notifyInstructionPeeled(OldInst, NewInst, Iteration);
				}
				};

				/// A callback type that takes a Value* and IRBuilder, and returns a Value*.
				typedef std::function<Value(Value,IRBuilder<>&)> ValueToValueCB;

				/// Contructs a new loop editor, editing L.
				LoopEditor(Loop L, ScalarEvolution SE, const DataLayout DL, LoopInfo LI,
				DominatorTree *DT);

				///
				/// Queries
				///

				/// Returns true if the loop is analyzable by the loop editor.
				bool isValid() const;

				/// Returns true if I is contained within blocks the loop editor owns.
				/// This is equivalent to Loop::contains(), but it also checks the optional
				/// bypass and dedicated exit blocks that are outside of Loop's knowledge.
				bool contains(Instruction *I);

				/// Allow accesses to the underlying Loop using ->.
				Loop *operator -> () const { return L; }

				/// Returns true if getReductionByPHI(V) would succeed.
				bool isValidReduction(Value *V);

				/// Retrieves an ID by which an existing reduction can be referenced. 'V'
				/// must be a reduction PHINode. Asserts on failure.
				const Reduction getReductionByPHI(Value *V);

				/// Retrieves all known reductions.
				ArrayRef<Reduction> getAllReductions();

				/// Retrieves the outgoing value of a reduction - its value on exit from the
				/// loop.
				Value *getOutgoingReduction(const Reduction &ID);

				/// Retrieves a Reduction valid for this loop, if one is found to be based
				/// on or derived from the given reduction. This will happen if this loop has
				/// been cloned from another loop previously.
				Reduction *getMatchingReduction(const Reduction &R);

				/// Retrieves all known inductions.
				ArrayRef<Induction> getAllInductions();

				/// Retrieves the outgoing value of an induction - its value on exit from the
				/// loop.
				Value *getOutgoingInduction(const Induction &ID);

				/// Retrieves a Induction valid for this loop, if one is found to be based
				/// on or derived from the given induction. This will happen if this loop has
				/// been cloned from another loop previously.
				Induction *getMatchingInduction(const Induction &R);

				/// Retrieves the executed trip count at the end of the loop. This may be
				/// zero if the loop was bypassed by a predicate, or less than the normal
				/// trip count if the loop was widened.
				Value *getExecutedTripCount();

				/// Retrieves the trip count of the loop as a SCEV. Note that this involves
				/// adding one to the backedge count and as such may overflow when expanded.
				const SCEV *getTripCount();

				///
				/// Mutations
				///

				/// Produce a new version of this loop. The new loop is returned in
				/// LoopEditor.
				///
				/// The new loop is inserted into the CFG on IncomingEdge (IncomingEdge is
				/// updated), so if IncomingEdge is A -> B, the result of this function will
				/// be A -> LOOP -> B.
				LoopEditor clone(Use &IncomingEdge, const Twine &NameSuffix = "");

				/// A "widened" loop iterates over the same range but with a wider step.
				/// For example the AddRec {5,1}<%L> widened by a factor 4 becomes {5,4}<%L>.
				///
				/// This is a requirement for loop interleaving and loop vectorization, but
				/// importantly this does not actually do any unrolling or vectorization.
				/// Instead, all induction variables have their steps altered and the trip
				/// count is modified.
				///
				/// Reductions are not altered.
				void widen(unsigned Factor);

				/// Set the loop trip count.
				void setTripCount(const SCEV *TripCount, bool CheckForOverflow);

				/// Adds a new reduction variable starting at StartValue. The reduction does
				/// not do anything until it is connected with connectReduction().
				Reduction addReduction(Value *StartValue);

				/// Connects the backedge of a reduction added with addReduction().
				void connectReduction(const Reduction ID, Value *V);

				/// Creates a new copy of all instructions in the loop body, except control flow
				/// instructions. Does not modify the control flow in the loop at all.
				///
				/// The inserted instructions are interleaved.
				///
				/// 'InductionF' is called when a use of an induction variable is seen.
				/// 'ReductionUseF' is called when a use of a reduction variable is seen.
				/// 'ReductionDefF' is called when a feed of a reduction variable is seen. The
				/// arguments are the newly created Value and the Reduction
				/// it fed in the original loop.
				/// FIXME: Change these callbacks to invokes on the delegate.
				void interleave(ValueToValueCB InductionF, ValueToValueCB ReductionUseF,
				std::function<void(Value*,const Reduction)> ReductionDefF,
				Delegate *D);

				/// Returns a dedicated exiting block, creating one if it does not exist. A
				/// dedicated exiting block is one where the only incoming arcs are from
				/// the loop latch or optionally the bypass block (The bypass block is a
				/// check to see if the loop would actually be executed at least once, as
				/// the trip count check is on the backedge).
				BasicBlock *getOrCreateDedicatedExitingBlock();

				/// Changes the block that is executed after this loop.
				void updateExitBlock(BasicBlock *BB);

				/// Add an incoming edge from the loop L, with the given mappings between our
				/// and L's reductions and inductions.
				///
				/// The InductionIDs and Reductions can either reference this loop or L's.
				void addIncoming(LoopEditor &L,
				const std::map<const Reduction, Value*> &Reductions,
				const std::map<const Induction, Value*> &Inductions);

				/// Add an incoming edge from the basic block BB.
				void addIncoming(BasicBlock *BB);

				/// Remove an incoming edge from BB. The edge must exist.
				void removeIncoming(BasicBlock *BB);

				///
				/// Macro-mutations - These are all helper functions implemented in terms of
				/// the mutations above.
				///

				/// Widens the loop by \c Factor and performs interleaved loop unrolling.
				///
				/// The reductions in the original loop get duplicated \c Factor times,
				/// then reduced at the end of the loop into one value and returned in Ret.
				void widenAndInterleaveLoop(unsigned Factor,
				std::map<const Reduction, Value*> &Ret,
				Delegate *D=nullptr);

				/// Versions, widens, and interleaves the loop by \c Factor.
				///
				/// The original loop is cloned and a widened loop created, which branches
				/// to the original scalar loop as a tail.
				///
				/// PredBB is created as a block to determine whether to branch to the widened
				/// loop or the scalar one (independent of any trip count and overflow checks).
				/// It is initially set up with just one branch:
				/// br i1 true, label %versioned-loop, label %scalar-loop
				LoopEditor versionWidenAndInterleaveLoop(unsigned Factor, BasicBlock *&PredBB,
				Delegate *D=nullptr);

				/// Peels \c Iterations iterations out of the loop and places them after the
				/// loop. The trip count of the loop is decreased to compensate.
				void peelAfter(unsigned Iterations, Delegate *D=nullptr);
				};
				} // end namespace llvm

				#endif

lib/Transforms/Utils/LoopEditor.cpp

This file was added.

				//===-- LoopEditor.cpp - High-level loop transformations ------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// The LoopEditor provides a toolkit for performing high-level transforms on
				// loops.
				//
				//===----------------------------------------------------------------------===//

				#include "LoopEditor.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Value.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/PrettyStackTrace.h"

				#define DEBUG_TYPE "loopeditor"
				using namespace llvm;

				void LoopEditor::Delegate::anchor() {}

				//
				// Helper functions.
				//
				Value LoopEditor::Reduction::createOp(IRBuilder<> &IRB, Value Op1, Value *Op2) const {
				assert(0 && "Implement!");
				}

				LLVMContext &LoopEditor::getContext() {
				return L->getHeader()->getParent()->getContext();
				}

				void LoopEditor::ensureStartValuesAvailable() {
				SmallVector<Value*,4> StartValues;
				SmallVector<BasicBlock*,4> IncomingBlocks;
				for (auto &I : Inductions)
				StartValues.push_back(I.StartValue);
				for (auto &I : Reductions)
				StartValues.push_back(I.StartValue);
				for (auto *BB : predecessors(Bypass ? Bypass : L->getLoopPreheader()))
				IncomingBlocks.push_back(BB);
				makeValuesAvailableInBlocks(StartValues, IncomingBlocks);
				}

				void LoopEditor::ensureCanonicalInductionVariable() {
				PHINode *IndVar = getCanonicalInductionVariable();
				assert(IndVar && "Canonical variable insertion not implemented!");

				// FIXME: We may have a canonical indvar, but we also need to check that it
				// is indeed used properly (compared with in the latch).
				}

				void LoopEditor::ensureBypassCreated() {
				if (Bypass)
				return;
				DEBUG(verify("Verification on entry to LoopEditor::ensureBypassCreated()"));
				assert(L->getLoopPreheader()->getSinglePredecessor() && "Multi-predecessor "
				"preheaders not implemented!");
				BasicBlock *Preheader = L->getLoopPreheader();

				Bypass = BasicBlock::Create(getContext(), "edit.bypass",
				Preheader->getParent(),
				Preheader);
				auto *T = Preheader->getSinglePredecessor()->getTerminator();
				T->replaceUsesOfWith(Preheader, Bypass);

				// Initialize the bypass with a tautological compare just to keep everything
				// connected.
				IRBuilder<> IRB(Bypass);
				IRB.CreateCondBr(IRB.getFalse(), L->getExitBlock(), Preheader);

				DT->addNewBlock(Bypass, T->getParent());
				DT->changeImmediateDominator(Preheader, Bypass);
				auto *ExitDom = DT->findNearestCommonDominator(Bypass, L->getExitBlock());
				DT->changeImmediateDominator(L->getExitBlock(), ExitDom);
				if (L->getParentLoop())
				L->getParentLoop()->addBasicBlockToLoop(Bypass, *LI);

				// Move any PHI nodes from the preheader to the bypass.
				for (auto I = Preheader->begin(); isa<PHINode>(I); I = Preheader->begin())
				I->moveBefore(Bypass->begin());

				// If there are any reduction PHIs in the exit block, they need to have an
				// incoming value for the bypass too.
				BasicBlock *ExitB = L->getExitBlock();
				for (auto I = ExitB->begin(); isa<PHINode>(I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				// They're in our exit block, so they must be a reduction if they use
				// a value defined in our loop.
				Value *V = PN->getIncomingValueForBlock(L->getLoopLatch());
				if (!isa<Instruction>(V) \|\| !L->contains(cast<Instruction>(V)))
				continue;

				// This value will be the outgoing reduction edge. Find the original
				// PHI.
				Reduction *R = nullptr;
				for (auto *U : V->users())
				if (isa<Instruction>(U) &&
				cast<Instruction>(U)->getParent() == L->getHeader()) {
				assert(ReductionsByValue.count(U) && "Not a known reduction?");
				R = &ReductionsByValue[U];
				}
				assert(R && "Not a known reduction?");
				PN->addIncoming(R->StartValue, Bypass);
				}

				DEBUG(verify("Verification on exit from LoopEditor::ensureBypassCreated()"));
				}

				PHINode *LoopEditor::getCanonicalInductionVariable() {
				// Loop::getCanonicalInductionVariable() only detects PHIs starting at zero,
				// but we also deal with loops whose start value comes from a PHI in
				// the preheader (and the default start value is zero).
				//
				// Code copied from LoopInfo.cpp.
				BasicBlock *H = L->getHeader();

				BasicBlock Incoming = nullptr, Backedge = nullptr;
				pred_iterator PI = pred_begin(H);
				assert(PI != pred_end(H) &&
				"Loop must have at least one backedge!");
				Backedge = *PI++;
				if (PI == pred_end(H)) return nullptr; // dead loop
				Incoming = *PI++;
				if (PI != pred_end(H)) return nullptr; // multiple backedges?

				if (L->contains(Incoming)) {
				if (L->contains(Backedge))
				return nullptr;
				std::swap(Incoming, Backedge);
				} else if (!L->contains(Backedge))
				return nullptr;

				// Loop over all of the PHI nodes, looking for a canonical indvar.
				for (BasicBlock::iterator I = H->begin(); isa<PHINode>(I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				if (ConstantInt *CI =
				dyn_cast<ConstantInt>(PN->getIncomingValueForBlock(Incoming))) {
				if (!CI->isNullValue())
				continue;
				} else if (InductionsByValue.count(PN) &&
				isa<ConstantInt>(InductionsByValue[PN].StartValue)) {
				if (!cast<ConstantInt>(InductionsByValue[PN].StartValue)->isNullValue())
				continue;
				} else {
				continue;
				}
				if (Instruction *Inc =
				dyn_cast<Instruction>(PN->getIncomingValueForBlock(Backedge)))
				if (Inc->getOpcode() == Instruction::Add &&
				Inc->getOperand(0) == PN)
				if (ConstantInt *CI = dyn_cast<ConstantInt>(Inc->getOperand(1)))
				if (CI->equalsInt(1))
				return PN;
				}
				return nullptr;
				}

				void LoopEditor::computeBackedgeCountComparison() {
				ensureCanonicalInductionVariable();
				BackedgeCountComparison =
				cast<Instruction>(cast<BranchInst>(L->getLoopLatch()->getTerminator())
				->getCondition());

				// We only detect post-inc comparisons here. Importantly if this is a
				// post-inc expression we know that the computation of the trip count cannot
				// overflow.
				if (SE->getSCEV(BackedgeCountComparison->getOperand(0)) == TripCountSCEV)
				BackedgeCountComparisonValue = BackedgeCountComparison->getOperand(0);
				else if (SE->getSCEV(BackedgeCountComparison->getOperand(1)) == TripCountSCEV)
				BackedgeCountComparisonValue = BackedgeCountComparison->getOperand(1);
				else
				assert(0 && "Unknown backedge count comparison!");
				}

				Value *LoopEditor::getStartingValue() {
				ensureStartValuesAvailable();
				PHINode *PN = getCanonicalInductionVariable();

				// First, get the starting value of PN.
				Value *StartV = PN->getIncomingValueForBlock(L->getLoopPreheader());
				// Is this a PHI in the preheader?
				if (PHINode *StartPHI = dyn_cast<PHINode>(StartV))
				return StartPHI;

				BasicBlock *HeadB = Bypass ? Bypass : L->getLoopPreheader();

				PHINode *StartPHI = PHINode::Create(PN->getType(), 2, "edit.startphi",
				HeadB->getFirstInsertionPt());
				for (auto *Pred : predecessors(HeadB))
				StartPHI->addIncoming(StartV, Pred);

				PN->setIncomingValue(PN->getBasicBlockIndex(L->getLoopPreheader()), StartPHI);
				DEBUG(verify("Verification on exit from LoopEditor::getStartingValue()"));
				return StartPHI;
				}

				BasicBlock LoopEditor::computeImmediateDominator(BasicBlock B) {
				// Find the common dominator of all the blocks.
				BasicBlock *Dom = nullptr;
				for (auto *BB : predecessors(B)) {
				if (!Dom)
				Dom = BB;
				else
				Dom = DT->findNearestCommonDominator(Dom, BB);
				}
				return Dom;
				}

				void LoopEditor::makeValuesAvailableInBlocks(ArrayRef<Value*> Values,
				ArrayRef<BasicBlock*> Blocks) {
				// Find the common dominator of all the blocks.
				BasicBlock *Dom = nullptr;
				for (auto *BB : Blocks) {
				if (!Dom)
				Dom = BB;
				else
				Dom = DT->findNearestCommonDominator(Dom, BB);
				}

				// Now, are all the values already available at the end of the dominator?
				for (auto *V : Values) {
				if (!isa<Instruction>(V))
				continue;
				if (DT->dominates(cast<Instruction>(V), Dom->getTerminator()))
				continue;
				// No, this value is not. Can we shift it?
				// FIXME: can we make this recursive? we bail out too early currently.
				for (auto &O : cast<User>(V)->operands())
				if (isa<Instruction>(O.get()) &&
				!DT->dominates(cast<Instruction>(O.get()), Dom->getTerminator())) {
				assert(0 && "Unable to hoist all values!");
				}
				// OK, its operands are available. Hoist it.
				cast<Instruction>(V)->moveBefore(Dom->getTerminator());
				}
				}

				void LoopEditor::addStartPHIIncoming(PHINode PN, Value V, BasicBlock *BB) {
				// First, get the starting value of PN.
				Value *StartV = PN->getIncomingValueForBlock(L->getLoopPreheader());

				// Is this a PHI in the preheader?
				if (PHINode *StartPHI = dyn_cast<PHINode>(StartV)) {
				// Update it.
				StartPHI->addIncoming(V, BB);
				return;
				}

				BasicBlock *HeadB = Bypass ? Bypass : L->getLoopPreheader();

				PHINode *StartPHI = PHINode::Create(PN->getType(), 2, "edit.startphi",
				HeadB->getFirstInsertionPt());
				for (auto *Pred : predecessors(HeadB))
				StartPHI->addIncoming((Pred == BB) ? V : StartV, Pred);
				PN->setIncomingValue(PN->getBasicBlockIndex(L->getLoopPreheader()), StartPHI);

				}

				bool LoopEditor::sameProvenance(const PHIID &A, const PHIID &B) {
				return A.Val == B.Val \|\| A.BasedOn == B.Val \|\| A.Val == B.BasedOn \|\|
				(A.BasedOn && B.BasedOn && A.BasedOn == B.BasedOn);
				}

				Value LoopEditor::determineStartValueForPHI(PHINode PN) {
				Value *StartV = PN->getIncomingValueForBlock(L->getLoopPreheader());
				if (!isa<PHINode>(StartV))
				return StartV;

				// If the PHI:
				// - is located in the preheader and
				// - for the incoming values that are constants those constants are the same
				// then this is a StartPHI and we can say that the start value of this phi
				// is the constant.
				auto *FirstBlock = Bypass ? Bypass : L->getLoopPreheader();
				auto *StartPN = cast<PHINode>(StartV);
				if (StartPN->getParent() != FirstBlock)
				return StartPN;

				Value *C = nullptr;
				for (auto &V : StartPN->incoming_values()) {
				if (!C)
				C = V;
				else if (C != V)
				return StartPN;
				}
				if (C)
				return C;
				return StartPN;
				}

				void LoopEditor::replacePHINodeIncomingValues(BasicBlock In, BasicBlock For,
				BasicBlock *With) {
				for (auto I = In->begin(); isa<PHINode>(I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				// We only delete the first incoming value for 'For'.
				if (PN->getBasicBlockIndex(For) != -1) {
				unsigned Idx = PN->getBasicBlockIndex(For);
				Value *V = PN->getIncomingValue(Idx);
				PN->removeIncomingValue(Idx, false);
				PN->addIncoming(V, With);
				}
				}
				}

				void LoopEditor::removePHINodeIncomingValues(BasicBlock In, BasicBlock For) {
				for (auto I = In->begin(); isa<PHINode>(I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				// We only delete the first incoming value for 'For'.
				if (PN->getBasicBlockIndex(For) != -1)
				PN->removeIncomingValue(PN->getBasicBlockIndex(For), false);
				}
				}

				BasicBlock *LoopEditor::getDedicatedExitingBlock() {
				BasicBlock *ExitB = L->getExitBlock();
				if (ExitB && Bypass && ExitB->hasNUses(2) &&
				ExitB->isUsedInBasicBlock(Bypass) &&
				ExitB->isUsedInBasicBlock(L->getLoopLatch()))
				return ExitB;
				else if (ExitB && !Bypass && ExitB->hasOneUse() &&
				ExitB->isUsedInBasicBlock(L->getLoopLatch()))
				return ExitB;
				return nullptr;
				}

				void LoopEditor::analyze() {
				// The primary thing we need to analyze is the PHI nodes. PHIs must either
				// be classified as an induction or as a reduction.
				for (auto I = L->getHeader()->begin(); isa<PHINode>(*I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				ConstantInt *CI;
				RecurrenceDescriptor RD;
				if (isInductionPHI(PN, SE, CI)) {
				Induction ID(determineStartValueForPHI(PN), PN);
				Inductions.push_back(ID);
				InductionsByValue.insert(std::make_pair(PN, ID));
				} else if (RecurrenceDescriptor::isReductionPHI(PN, L, RD)) {
				Reduction ID(determineStartValueForPHI(PN), PN);
				Reductions.push_back(ID);
				ReductionsByValue.insert(std::make_pair(PN, ID));
				} else {
				DEBUG(dbgs() << "LoopEditor: unidentified PHI!: " << *PN);
				return;
				}
				}

				TripCountSCEV = SE->getBackedgeTakenCount(L);
				if (isa<SCEVCouldNotCompute>(TripCountSCEV))
				return;
				TripCountSCEV = SE->getAddExpr(TripCountSCEV,
				SE->getConstant(TripCountSCEV->getType(), 1));

				Analyzed = true;
				}

				void LoopEditor::verify(const char *Text) {
				PrettyStackTraceString Trace(Text);

				assert(verifyFunction(*L->getHeader()->getParent(), &dbgs()) == false);
				DT->verifyDomTree();
				LI->verify();
				SE->verifyAnalysis();
				}

				//
				// Public API
				//

				LoopEditor::LoopEditor(Loop L, ScalarEvolution SE, const DataLayout *DL,
				LoopInfo LI, DominatorTree DT)
				: L(L), SE(SE), DL(DL), LI(LI), DT(DT), Analyzed(false), TripCountSCEV(nullptr),
				BackedgeCountComparisonValue(nullptr), Bypass(nullptr),
				ExecutedTripCountValue(nullptr) {
				analyze();
				}

				LoopEditor::LoopEditor(Loop L, BasicBlock Bypass, ScalarEvolution *SE,
				const DataLayout DL, LoopInfo LI, DominatorTree *DT)
				: L(L), SE(SE), DL(DL), LI(LI), DT(DT), Analyzed(false), TripCountSCEV(nullptr),
				BackedgeCountComparisonValue(nullptr), Bypass(Bypass),
				ExecutedTripCountValue(nullptr) {
				analyze();
				}

				bool LoopEditor::isValid() const {
				return Analyzed;
				}

				bool LoopEditor::contains(Instruction *I) {
				if (L->contains(I))
				return true;
				if (I->getParent() == Bypass \|\| I->getParent() == getDedicatedExitingBlock())
				return true;
				return false;
				}

				bool LoopEditor::isValidReduction(Value *V) {
				return ReductionsByValue.count(V);
				}

				const LoopEditor::Reduction LoopEditor::getReductionByPHI(Value *V) {
				assert(ReductionsByValue.count(V));
				return ReductionsByValue[V];
				}

				ArrayRef<LoopEditor::Reduction> LoopEditor::getAllReductions() {
				return Reductions;
				}

				Value *LoopEditor::getOutgoingReduction(const Reduction &ID) {
				assert(L->contains(ID.Val) && "Unknown reduction!");
				// We always provide outgoing reductions in LCSSA form to make them
				// easier to re-analyze.
				Value *V = ID.Val->getIncomingValueForBlock(L->getLoopLatch());

				BasicBlock *DEB = getOrCreateDedicatedExitingBlock();

				// Do any of the PHIs in the DEB already use V? If so, just return that.
				for (auto I = DEB->begin(); isa<PHINode>(I); ++I)
				if (cast<PHINode>(I)->getIncomingValueForBlock(L->getLoopLatch()) == V)
				return I;

				// Create a new PHI.
				PHINode *PN = PHINode::Create(V->getType(), 2, "reduction.merge",
				DEB->getFirstInsertionPt());
				PN->addIncoming(V, L->getLoopLatch());
				PN->addIncoming(ID.getStartValue(), Bypass);

				DEBUG(verify("Verification on exit of LoopEditor::getOutgoingReduction()"));
				return PN;
				}

				LoopEditor::Reduction *LoopEditor::getMatchingReduction(const Reduction &R) {
				for (auto &Red : Reductions)
				if (Red.BasedOn == R.Val \|\| Red.BasedOn == R.BasedOn)
				return &Red;
				return nullptr;
				}

				ArrayRef<LoopEditor::Induction> LoopEditor::getAllInductions() {
				return Inductions;
				}

				Value *LoopEditor::getOutgoingInduction(const Induction &ID) {
				assert(L->contains(ID.Val) && "Unknown induction!");
				Value *V = ID.Val->getIncomingValueForBlock(L->getLoopLatch());
				BasicBlock *DEB = getOrCreateDedicatedExitingBlock();

				// Do any of the PHIs in the DEB already use V? If so, just return that.
				for (auto I = DEB->begin(); isa<PHINode>(I); ++I)
				if (cast<PHINode>(I)->getIncomingValueForBlock(L->getLoopLatch()) == V)
				return I;

				// Create a new PHI.
				PHINode *PN = PHINode::Create(V->getType(), 2, "induction.merge",
				DEB->getFirstInsertionPt());
				PN->addIncoming(V, L->getLoopLatch());
				PN->addIncoming(ID.getStartValue(), Bypass);

				DEBUG(verify("Verification on exit of LoopEditor::getOutgoingInduction()"));
				return PN;
				}

				LoopEditor::Induction *LoopEditor::getMatchingInduction(const Induction &I) {
				for (auto &Ind : Inductions)
				if (Ind.BasedOn == I.Val \|\| Ind.BasedOn == I.BasedOn)
				return &Ind;
				return nullptr;
				}

				Value *LoopEditor::getExecutedTripCount() {
				// If we have a valid trip count already, use that.
				if (ExecutedTripCountValue)
				return ExecutedTripCountValue;

				// Otherwise, we need to construct it. If we go into the loop we have
				// the loop induction variable - what if we have a bypass?
				ensureCanonicalInductionVariable();
				ExecutedTripCountValue = InductionsByValue[getCanonicalInductionVariable()].Val
				->getIncomingValueForBlock(L->getLoopLatch());
				if (!Bypass)
				return ExecutedTripCountValue;

				// We'll have to PHI zero (from the bypass) and the trip count together.
				BasicBlock *BB = getOrCreateDedicatedExitingBlock();
				assert(BB->getNumUses() == 2 && "Expected two incoming arcs to exit block!");

				IRBuilder<> IRB(BB->getFirstInsertionPt());
				PHINode *PN = IRB.CreatePHI(TripCountSCEV->getType(), 2, "executed_trip_count");
				PN->addIncoming(ConstantInt::get(cast<IntegerType>(TripCountSCEV->getType()), 0),
				Bypass);
				PN->addIncoming(ExecutedTripCountValue, L->getLoopLatch());
				ExecutedTripCountValue = PN;

				DEBUG(verify("Verification on exit from LoopEditor::getExecutedTripCount()"));
				return ExecutedTripCountValue;
				}

				const SCEV *LoopEditor::getTripCount() {
				return TripCountSCEV;
				}

				void LoopEditor::setTripCount(const SCEV *TC, bool CheckForOverflow) {
				DEBUG(verify("Verification on entry to LoopEditor::setTripCount()"));
				// FIXME: Change "checkforoverflow" to a negated "don't check for overflow".
				// As we have a post-inc comparison value, we may be able to determine that
				// overflow doesn't need checking.
				//
				// If we don't know the trip count as a value, we'll need to teach ourselves
				// it first.
				if (!BackedgeCountComparisonValue) {
				ensureCanonicalInductionVariable();
				computeBackedgeCountComparison();
				}
				TripCountSCEV = TC;

				auto *OldBEComparisonValue = BackedgeCountComparisonValue;
				Type *Ty = TripCountSCEV->getType();

				// Cache these here so LoopInfo doesn't get confused when we rip out the
				// bypass below.
				BasicBlock *ExitB = L->getExitBlock();
				BasicBlock *PreheaderB = L->getLoopPreheader();

				// Now check that the loop will be executed at least once.
				ensureBypassCreated();
				Bypass->getTerminator()->eraseFromParent();

				IRBuilder<> IRB(Bypass);
				IRB.CreateCondBr(IRB.getFalse(), ExitB, PreheaderB);

				SCEVExpander Expander(SE, DL, "backedgecount");
				Value *StartV = getStartingValue();
				Value *TCV = Expander.expandCodeFor(TripCountSCEV, Ty,
				Bypass->getFirstInsertionPt());
				BackedgeCountComparisonValue = TCV;

				IRB.SetInsertPoint(Bypass->getTerminator());
				// If the two values are different types, zero-extend to the largest.
				if (StartV->getType()->getScalarSizeInBits() >
				TCV->getType()->getScalarSizeInBits())
				TCV = IRB.CreateZExt(TCV, StartV->getType());
				else if (StartV->getType()->getScalarSizeInBits() <
				TCV->getType()->getScalarSizeInBits())
				StartV = IRB.CreateZExt(StartV, TCV->getType());

				Value *Check = IRB.CreateICmpUGE(StartV, TCV);
				if (CheckForOverflow) {
				auto *CI = ConstantInt::get(cast<IntegerType>(TCV->getType()), 0 );
				Check = IRB.CreateOr(Check, IRB.CreateICmpEQ(TCV, CI));
				}
				cast<BranchInst>(Bypass->getTerminator())->setCondition(Check);

				BackedgeCountComparison->replaceUsesOfWith(OldBEComparisonValue,
				BackedgeCountComparisonValue);

				// We've updated the trip count. This makes SCEVs invalid.
				SE->forgetLoop(L);
				DEBUG(verify("Verification on exit from LoopEditor::setTripCount()"));
				}

				LoopEditor LoopEditor::clone(Use &IncomingEdge,
				const Twine &NameSuffix) {
				DEBUG(verify("Verification on entry to LoopEditor::clone()"));
				SmallVector<BasicBlock*,8> NewBlocks;
				ValueToValueMapTy VM;
				BasicBlock *Preheader = L->getLoopPreheader();
				BasicBlock *InsertPt = L->getLoopPreheader();
				BasicBlock *Dominator = cast<Instruction>(IncomingEdge.getUser())->getParent();
				BasicBlock *DedicatedExitB = getDedicatedExitingBlock();
				BasicBlock *LastBlock = DedicatedExitB ? DedicatedExitB : L->getExitingBlock();
				BasicBlock *ExitBlock = DedicatedExitB ?
				DedicatedExitB->getTerminator()->getSuccessor(0) :
				L->getExitBlock();

				if (Bypass) {
				BasicBlock *BB = CloneBasicBlock(Bypass, VM, NameSuffix);
				BB->insertInto(InsertPt->getParent(), InsertPt);
				VM[Bypass] = BB;
				DT->addNewBlock(BB, Dominator);
				NewBlocks.push_back(BB);
				if (L->getParentLoop())
				L->getParentLoop()->addBasicBlockToLoop(BB, *LI);
				}
				Loop *NewL = cloneLoopWithPreheader(InsertPt,
				Bypass ? cast<BasicBlock>(VM[Bypass]) : Dominator,
				L, VM, NameSuffix, LI, DT, NewBlocks);
				if (DedicatedExitB && !L->contains(DedicatedExitB)) {
				BasicBlock *BB = CloneBasicBlock(DedicatedExitB, VM, NameSuffix);
				BB->insertInto(InsertPt->getParent(), InsertPt);
				VM[DedicatedExitB] = BB;
				DT->addNewBlock(BB, Bypass ? cast<BasicBlock>(VM[Bypass]) :
				cast<BasicBlock>(VM[L->getExitingBlock()]));
				NewBlocks.push_back(BB);
				if (L->getParentLoop())
				L->getParentLoop()->addBasicBlockToLoop(BB, *LI);
				}
				remapInstructionsInBlocks(NewBlocks, VM);

				BasicBlock *BB = cast<BasicBlock>(IncomingEdge.get());
				// There's an edge between Dominator and BB, and we want to insert a loop
				// in the middle of it:
				//
				// Dom Dom
				// \| => \
				// \| LOOP
				// BB /
				// BB
				// So we need to change any incoming PHI values on that edge.
				replacePHINodeIncomingValues(BB, Dominator, cast<BasicBlock>(VM[LastBlock]));
				// If it was pointing into the old loop, we'll also have to remove the cloned
				// incomings.
				if (L->contains(BB) \|\| BB == Bypass \|\| BB == Preheader \|\| BB == DedicatedExitB)
				removePHINodeIncomingValues(cast<BasicBlock>(VM[BB]), Dominator);

				// We also need to change any PHI nodes in our new entry block to point
				// to our new predecessor, Dom.
				BasicBlock *FirstBlock = Bypass ? Bypass : Preheader;
				for (auto *P : predecessors(FirstBlock))
				replacePHINodeIncomingValues(cast<BasicBlock>(VM[FirstBlock]),
				P, Dominator);

				// Set the incoming and outgoing edges.
				IncomingEdge.set(cast<BasicBlock>(VM[FirstBlock]));
				auto *T = cast<BasicBlock>(VM[LastBlock])->getTerminator();
				T->replaceUsesOfWith(ExitBlock, BB);

				assert(NewL->getLoopPreheader() && "New loop is malformed!");
				assert(NewL->getHeader() && "New loop is malformed!");

				DT->changeImmediateDominator(ExitBlock,
				computeImmediateDominator(ExitBlock));

				LoopEditor LE(NewL,
				Bypass ? cast<BasicBlock>(VM[Bypass]) : nullptr,
				SE, DL, LI, DT);
				assert(LE.isValid() && "Generated loopeditor is not valid!");

				// Remap the inductions and reductions.
				for (auto &I : Inductions)
				LE.InductionsByValue[VM[I.Val]].BasedOn = I.Val;
				for (auto &I : LE.Inductions)
				I.BasedOn = LE.InductionsByValue[I.Val].BasedOn;
				for (auto &I : Reductions)
				LE.ReductionsByValue[VM[I.Val]].BasedOn = I.Val;
				for (auto &I : LE.Reductions)
				I.BasedOn = LE.ReductionsByValue[I.Val].BasedOn;

				DEBUG(verify("Verification on exit from LoopEditor::clone()"));
				return LE;
				}

				void LoopEditor::widen(unsigned Factor) {
				DEBUG(verify("Verification on entry to LoopEditor::widen()"));
				const SCEV *NewTC = SE->getUDivExpr(TripCountSCEV,
				SE->getConstant(TripCountSCEV->getType(),
				Factor));
				setTripCount(NewTC, true);

				for (auto &I : Inductions)
				if (I.Val != getCanonicalInductionVariable()) {
				Value *OldV = I.Val->getIncomingValueForBlock(L->getLoopLatch());
				Type *Ty = OldV->getType();
				// FIXME: this is wrong! doesn't work for -1 or non-unit.
				Value *NewV = BinaryOperator::Create(Instruction::Add,
				OldV,
				ConstantInt::get(Ty, Factor - 1),
				OldV->getName(),
				L->getLoopLatch()->getTerminator());
				I.Val->setIncomingValue(I.Val->getBasicBlockIndex(L->getLoopLatch()),
				NewV);
				}

				SE->forgetLoop(L);
				DEBUG(verify("Verification on exit from LoopEditor::widen()"));
				}

				LoopEditor::Reduction LoopEditor::addReduction(Value *StartValue) {
				PHINode *PN = PHINode::Create(StartValue->getType(), 2, "rdx.loopedit",
				L->getHeader()->getFirstInsertionPt());
				PN->addIncoming(StartValue, L->getLoopPreheader());
				PN->addIncoming(PN, L->getLoopLatch());

				Reduction R(StartValue, PN);
				Reductions.push_back(R);
				ReductionsByValue[PN] = R;

				DEBUG(verify("Verification on exit from LoopEditor::addReduction()"));
				return R;
				}

				void LoopEditor::connectReduction(const Reduction ID, Value *V) {
				ID.Val->setIncomingValue(ID.Val->getBasicBlockIndex(L->getLoopLatch()),
				V);
				DEBUG(verify("Verification on exit from LoopEditor::connectReduction()"));
				}

				void LoopEditor::interleave(ValueToValueCB InductionF,
				ValueToValueCB ReductionUseF,
				std::function<void(Value*,const Reduction)> ReductionDefF,
				Delegate *D) {
				// Precache all the instructions we want to clone, so we're resilient to
				// changes in the function.
				SmallVector<Instruction*,32> InstsToClone;
				SmallVector<Instruction*,32> InstsToRemap;
				for (auto *BB : L->getBlocks()) {
				if (BB == L->getLoopPreheader())
				continue;
				for (auto &I : *BB)
				if (!isa<TerminatorInst>(&I))
				InstsToClone.push_back(&I);
				}

				// Now do the clone.
				ValueToValueMapTy VM;
				for (auto *I : InstsToClone) {
				if (InductionsByValue.count(I)) {
				IRBuilder<> IRB(I->getParent()->getFirstInsertionPt());
				VM[I] = InductionF(I, IRB);
				} else if (ReductionsByValue.count(I)) {
				IRBuilder<> IRB(I->getParent()->getFirstInsertionPt());
				VM[I] = ReductionUseF(I, IRB);
				} else {
				Instruction *NewI = I->clone();
				NewI->insertAfter(I);
				InstsToRemap.push_back(NewI);
				VM[I] = NewI;
				}
				}
				for (auto *I : InstsToRemap)
				RemapInstruction(I, VM, RF_IgnoreMissingEntries);

				for (auto &R : Reductions)
				ReductionDefF(VM[getOutgoingReduction(R)], R);

				if (D)
				for (auto *I : InstsToClone)
				// FIXME: cast to instruction not right here - should be Value.
				D->notifyInstructionInterleaved(I, cast<Instruction>(VM[I]));

				DEBUG(verify("Verification on exit from LoopEditor::interleave()"));
				}

				BasicBlock *LoopEditor::getOrCreateDedicatedExitingBlock() {
				assert(L->getExitingBlock() && "Expect only one latch!");
				if (getDedicatedExitingBlock())
				return getDedicatedExitingBlock();
				BasicBlock *ExitB = L->getExitBlock();

				// OK, time to create one.
				BasicBlock *NewExitB = BasicBlock::Create(getContext(),
				"loopedit.dedicatedexit",
				ExitB->getParent(),
				ExitB);
				IRBuilder<> IRB(NewExitB);
				IRB.CreateBr(ExitB);
				L->getLoopLatch()->getTerminator()->replaceUsesOfWith(ExitB, NewExitB);
				if (Bypass)
				Bypass->getTerminator()->replaceUsesOfWith(ExitB, NewExitB);

				// If any PHIs in ExitB refer to values computed within our loop,
				// create an equivalent PHI in NewExitB and point the original PHI at it.
				for (auto I = ExitB->begin(); isa<PHINode>(I); ++I) {
				PHINode *PN = cast<PHINode>(I);
				PHINode *NewPN = cast<PHINode>(I->clone());
				NewPN->insertBefore(NewExitB->getFirstInsertionPt());

				// Remove any edges from predecessors that aren't our bypass or latch
				for (auto *P : predecessors(ExitB))
				if (P != Bypass && P != L->getLoopLatch() &&
				NewPN->getBasicBlockIndex(P) != -1)
				NewPN->removeIncomingValue(NewPN->getBasicBlockIndex(P));

				PN->removeIncomingValue(PN->getBasicBlockIndex(L->getLoopLatch()));
				if (Bypass)
				PN->removeIncomingValue(PN->getBasicBlockIndex(Bypass));
				PN->addIncoming(NewPN, NewExitB);
				}

				DT->addNewBlock(NewExitB, Bypass ? Bypass : L->getLoopLatch());
				// Because we know ExitB has predecessors other than within our loop (else
				// it would be defined as a dedicated exiting block), its order in the
				// DomTree is explicitly not affected.
				if (L->getParentLoop())
				L->getParentLoop()->addBasicBlockToLoop(NewExitB, *LI);

				DEBUG(verify("Verification on exit from LoopEditor::getOrCreateDedicatedExitingBlock()"));
				return NewExitB;
				}

				void LoopEditor::updateExitBlock(BasicBlock *BB) {
				BasicBlock *EB = getOrCreateDedicatedExitingBlock();
				BasicBlock *CurrentExitB = EB->getTerminator()->getSuccessor(0);
				EB->getTerminator()->setSuccessor(0, BB);

				BasicBlock *NewDom = nullptr;
				for (auto *Pred : predecessors(CurrentExitB))
				NewDom = NewDom ? DT->findNearestCommonDominator(NewDom, Pred) : Pred;
				assert (NewDom);
				DT->changeImmediateDominator(CurrentExitB, NewDom);

				removePHINodeIncomingValues(CurrentExitB, EB);
				}

				void LoopEditor::addIncoming(
				LoopEditor &LE,
				const std::map<const Reduction, Value*> &OtherReductions,
				const std::map<const Induction, Value*> &OtherInductions) {
				DEBUG(verify("Verification on entry to LoopEditor::addIncoming(LoopEditor&)"));
				assert(Reductions.size() == OtherReductions.size() && "Reductions size mismatch!");
				assert(Inductions.size() == OtherInductions.size() && "Inductions size mismatch!");

				ensureStartValuesAvailable();
				ensureCanonicalInductionVariable();

				BasicBlock *BB = LE.getOrCreateDedicatedExitingBlock();

				LE.updateExitBlock(Bypass ? Bypass : L->getLoopPreheader());

				// Try and match up the reductions and inductions.
				unsigned SeenReductions = 0, SeenInductions = 0;
				for (auto &OtherKV : OtherReductions)
				for (auto &ThisR : Reductions)
				if (sameProvenance(ThisR, OtherKV.first)) {
				SeenReductions++;
				addStartPHIIncoming(ThisR.Val, OtherKV.second, BB);
				}
				for (auto &OtherKV : OtherInductions)
				for (auto &ThisI : Inductions)
				if (sameProvenance(ThisI, OtherKV.first)) {
				SeenInductions++;
				addStartPHIIncoming(ThisI.Val, OtherKV.second, BB);
				}
				assert(SeenReductions == Reductions.size() && "Not all reductions matched!");
				assert(SeenInductions == Inductions.size() && "Not all inductions matched!");

				DEBUG(verify("Verification on exit from LoopEditor::addIncoming(LoopEditor&)"));
				}

				void LoopEditor::addIncoming(BasicBlock *BB) {
				DEBUG(verify("Verification on entry to LoopEditor::addIncoming(BasicBlock*)"));
				ensureStartValuesAvailable();

				for (auto &R : Reductions)
				addStartPHIIncoming(R.Val, R.StartValue, BB);
				for (auto &I : Inductions)
				addStartPHIIncoming(I.Val, I.StartValue, BB);

				DEBUG(verify("Verification on exit from LoopEditor::addIncoming(BasicBlock*)"));
				}

				void LoopEditor::removeIncoming(BasicBlock *BB) {
				BasicBlock *StartB = Bypass ? Bypass : L->getLoopPreheader();
				for (auto I = StartB->begin(); isa<PHINode>(&*I); ++I)
				cast<PHINode>(I)->removeIncomingValue(BB);

				DEBUG(verify("Verification on exit from LoopEditor::removeIncoming()"));
				}

				//
				// Macro-mutators
				//

				void LoopEditor::widenAndInterleaveLoop(unsigned Factor,
				std::map<const Reduction, Value*> &Ret,
				Delegate *D) {
				widen(Factor);

				std::map<Reduction, std::vector<Reduction> > RedMap;
				for (auto ID : getAllReductions())
				for (unsigned I = 1; I < Factor; ++I)
				RedMap[ID].push_back(addReduction(ID.getStartValue()));

				for (unsigned Iter = 1; Iter < Factor; ++Iter) {
				if (D) D->notifyInterleaveIterationStarting(Iter);
				interleave(
				[=](Value *V, IRBuilder<> &IRB) {
				return IRB.CreateAdd(V, ConstantInt::get(cast<IntegerType>(V->getType()),
				Iter));
				},
				[&](Value *V, IRBuilder<> &IRB) {
				return RedMap[getReductionByPHI(V)][Iter - 1].getPHI();
				},
				[&](Value *V, const Reduction OldR) {
				connectReduction(RedMap[OldR][Iter - 1], V);
				},
				D);
				}

				auto *ExitBB = getOrCreateDedicatedExitingBlock();
				IRBuilder<> IRB(ExitBB->getTerminator());

				for (auto &KV : RedMap) {
				Value *V = getOutgoingReduction(KV.first);
				for (auto &ID : KV.second)
				V = ID.createOp(IRB, V, getOutgoingReduction(ID));
				Ret[KV.first] = V;
				}
				}

				LoopEditor LoopEditor::versionWidenAndInterleaveLoop(unsigned Factor,
				BasicBlock *&PredBB,
				Delegate *D) {
				BasicBlock *Preheader = L->getLoopPreheader();
				BasicBlock *EntryFrom = Preheader->getSinglePredecessor();
				assert(EntryFrom && "Couldn't get entry predecessor!");

				// Create a block to hold any predicates, and connect it to
				// the original loop to keep it in the CFG.
				removeIncoming(EntryFrom);
				PredBB = BasicBlock::Create(getContext(), "wide.pred.check",
				Preheader->getParent(),
				Preheader);
				auto *BI = BranchInst::Create(Preheader, Preheader,
				ConstantInt::getTrue(getContext()),
				PredBB);
				EntryFrom->getTerminator()->replaceUsesOfWith(Preheader, PredBB);
				DT->addNewBlock(PredBB, EntryFrom);
				DT->changeImmediateDominator(Preheader, PredBB);
				addIncoming(PredBB);

				// Get the zero'th use of Preheader in BI. That's the Use that we'll
				// hang the cloned loop off of.
				Use &U = BI->getOperandUse(2);
				assert(U.get() == Preheader);

				// Clone the original loop.
				LoopEditor WideL = clone(U, ".wide");

				// Now widen the loop.
				std::map<const Reduction, Value*> ReducedReductions;
				WideL.widenAndInterleaveLoop(Factor, ReducedReductions);

				// Now connect the wide exit to the original (our) entry.
				std::map<const Induction, Value*> Inductions;
				for (auto ID : WideL.getAllInductions())
				Inductions[ID] = WideL.getOutgoingInduction(ID);
				addIncoming(WideL, ReducedReductions, Inductions);

				return WideL;
				}

				void LoopEditor::peelAfter(unsigned Iterations, Delegate *D) {
				// To peel an iteration, we reduce the loop iteration count,
				// clone it, then reduce the clone's iteration count to one to effectively
				// unloop it.
				setTripCount(SE->getMinusSCEV(getTripCount(),
				SE->getConstant(getTripCount()->getType(),
				Iterations)),
				true /FIXME, what?/);

				BasicBlock *ExitB = getOrCreateDedicatedExitingBlock();
				for (unsigned Iteration = 0; Iteration < Iterations; ++Iteration) {
				// A dedicated exit block will always only have one successor.
				Use &U = ExitB->getTerminator()->getOperandUse(0);
				LoopEditor NewLE = clone(U, Twine(".peel-")+Twine(Iteration));
				NewLE.setTripCount(SE->getConstant(getTripCount()->getType(), 1), false);

				// Connect up the reductions and inductions.
				std::map<const Induction, Value*> Inductions;
				for (auto ID : getAllInductions()) {
				Inductions[ID] = getOutgoingInduction(ID);
				auto *Match = NewLE.getMatchingInduction(ID);
				// FIXME: This isn't very elegant. Find a better way to do this.
				assert(Match);
				std::vector<Use*> V;
				for (auto &U : Inductions[ID]->uses())
				if (!contains(cast<Instruction>(U.getUser())))
				V.push_back(&U);
				for (auto *U : V)
				U->set(NewLE.getOutgoingInduction(*Match));
				}
				std::map<const Reduction, Value*> Reductions;
				for (auto ID : getAllReductions()) {
				Reductions[ID] = getOutgoingReduction(ID);
				auto *Match = NewLE.getMatchingReduction(ID);
				assert(Match);
				std::vector<Use*> V;
				for (auto &U : Reductions[ID]->uses())
				if (!contains(cast<Instruction>(U.getUser())))
				V.push_back(&U);
				for (auto *U : V)
				U->set(NewLE.getOutgoingReduction(*Match));
				}

				NewLE.addIncoming(*this, Reductions, Inductions);
				}
				}

This is an archive of the discontinued LLVM Phabricator instance.

RFC: LoopEditor, a high-level loop transform toolkitNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 30702

include/llvm/Transforms/Utils/LoopEditor.h

lib/Transforms/Utils/LoopEditor.cpp

RFC: LoopEditor, a high-level loop transform toolkit
Needs ReviewPublic