This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/Utils/
-
Utils/
-
LoopEditor.h
-
lib/
-
Target/ARM/
-
ARM/
1
Thumb2ITBlockPass.cpp
-
Transforms/
-
Scalar/
-
Scalar.cpp
-
Utils/
-
CMakeLists.txt
-
LoopEditor.cpp
-
Vectorize/
-
LoopVectorize.cpp

Differential D11530

RFC: LoopEditor, a high-level loop transform toolkit
Needs ReviewPublic

Authored by jmolloy on Jul 27 2015, 10:03 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
aschwaighofer
hfinkel

Summary

This is a prototype API designed to perform or to enable high-level loop transformations. It is my hope that it could be used throughout LLVM's vectorizers and loop transforms as the go-to API for editing loops.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 30702.Jul 27 2015, 10:03 AM

jmolloy retitled this revision from to RFC: LoopEditor, a high-level loop transform toolkit.

jmolloy updated this object.

jmolloy set the repository for this revision to rL LLVM.

Hi Adam, Hal, Michael, Nadav, Arnold,

Thanks for your comments on my RFC thread.

This latest diff is actually for you to review; I have three things here:

The latest incarnation of the LoopEditor header/API.
A stubbed out .cpp so it all compiles and links.
A proof-of-concept showing the LoopEditor applied to the LoopVectorizer and replacing createEmptyBlock().

Only (1) and (2) am I looking to commit; (3) is here as an example of how the LoopEditor might fit. I should mention that (3) looks horrible, and rightly so. There would need to be a bunch of refactoring, and for this proof of concept I've tried to keep the LoC count down. But I would, in the end:

Move almost all the InnerLoopVectorizer code into the delegate, or more cleanly make InnerLoopVectorizer a subclass of Delegate.
Start by just using LoopEditor to replace createEmptyBlock and let LoopVectorizer do most of the cloning itself
Then start to de-duplicate a lot of the cloning functionality - removing the unroll factor completely from the LoopVectorizer class as a start (LoopEditor can do all that itself), then moving from LoopVectorizer managing its own PHIs to LoopEditor doing that for it.
By the end, hopefully having something substantially cleaner with a very distinct and testable API.

Specific improvements in the API since my RFC:

Add the minimal hook functions that the LoopVectorizer needs to the delegate.
Add the concept of an "AnalysisLevel" - different mutations or analyses on a loop may depend on being able to analyze more or less information about the loop. For example, loop cloning (as used by loop distribution) doesn't need to identify all recurrences, but loop vectorization does.
Revamp Reductions and Inductions and how they're represented, with a proper class hierarchy.
Streamline the API a tad.

Please let me know your thoughts!

James

jmolloy added reviewers: anemet, aschwaighofer, mzolotukhin, hfinkel, reames.Aug 4 2015, 7:17 AM

jmolloy added a subscriber: llvm-commits.

Hi all,

Does anyone have any thoughts on the API design? I'm keen to get cracking on this.

Cheers,

James

Hi James,

I've started looking at this and should get back to you today or tomorrow.

Adam

I think this looks quite useful.

One thing that is not obvious to me is what the LoopEditor will do for loops that are not inner loops. Are any of these operations inner-loop only?

Also, how will if-conversion (during vectorization, etc.) work?

James,

I haven't looked at the patch, just getting my bearings...

Is this just for the vectorizer, or are you planning to extend this to *any* loop transformation?

We have some shared methods in lib/Transforms/Utils/LoopUtils.cpp, which you don't seem to touch. Are you planning on joining all loop transformations into one tool-kit, then use for all vectorizer, unroller, simplify, rotation, etc.?

cheers,
--renato

I'm failing to see (partially because it's my first time looking at it) how your new hooks will make sure all the pre-analysis is done before the actual vectorization. But it seems Adam and others are more familiar with it, so I'll just let them comment about that.

Having said that, I welcome any change that makes it easier to share code, and it now seems to me that other loop passes will be able to use the same hooks and reduce their complexity as well as you did in the vectorizer.

cheers,
--renato

lib/Target/ARM/Thumb2ITBlockPass.cpp
186	oops

Hi James,

Now, I read the whole patch and I guess I am still more-or-less left with my original questions from the RFC thread.

Can we further decompose the functionality beyond LoopEditor? One straw-man I was thinking it to focus on the widening functionality. Both interleaving and loop-vectorization could be considered widening differing only through the means they achieve widening (by vectorizing or duplicating instructions). Then obviously the hook could provide the instruction-specific widening operation and how reduction values should be recovered at the exit of the loop.

I would also encourage you to use LoopVersioning. The idea was certainly to convert the vectorizer to be another client of LVer. If you need to add hooks or what not feel free. The plan for LoopVersioning has always been to go beyond just alias-checks (we have immediate plans to add dynamic loop-trip count checks) so having to reimplement this at various places would be a mistake now that I finally refactored it.

I would also think that it would be a design mistake to re-implement LoopVersioning on top of something more complex, i.e. LoopEditor. I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses. (Having AnalysisLevel in the class is an indication to me the class is not properly decomposed.)

It also unclear to me why want to pin down the API by committing it with the implementation stubbed out. I think it's good to discuss the end goal but why can't we invent and refine the API gradually as you start refactoring and transitioning over the existing functionality?

Thanks,
Adam

In D11530#220815, @anemet wrote:

I rather have as simple classes as possible for things like LoopVersioning, LoopWidening, LoopPeeling, etc and then compose transformations using these classes. This would make things more explicit and easier to reason about correctness: input, ouput state, required analyses, modified analyses.

I agree with this statement. Having multiple, independent, focused tools, used by multiple, independent passes is a better design.

It may, however, need some redundant information about the state of things on each tool / user, but I think we can manage it.

cheers,
--renato

Hi Adam, Renato,

Thanks for your review and sorry for the time it took to get back to you.

I think I agree with all your comments. The LoopEditor structure is a bit of a monolith - I hadn't really noticed as I was designing it. I like your idea of composable operations more (and it shows that I should re-read a design patterns book sometime soon!). I confess that I did see LoopVersioning as a trivial user of LoopEditor, but I do understand your reservations here.

I'm also not pushing for the entire API to pushed in in one go - it was just, as you said, describing the end goal.

So it looks like this becomes:

Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).
[My own requirement] Make sure loopversioning works with non-leaf loops.
Create a new class LoopWidening.
- I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".
[Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

The names are up for bikeshedding.

This seems to make review much easier. What do you think of this as a plan?

Cheers,

James

Hi James,

Thanks very much for considering my comments. I agree with this direction overall. I have a few specific comments below:

So it looks like this becomes:

Improvements to LoopVersioning to make LAA optional (choosing one loop or another shouldn't be tied to LAA - any predicate at all should do fine).

Agreed. Silviu's Assumption-based SCEV is already proposing to use LoopVer to host overflow checks. I also have plans to add dynamic trip-count checks to allow a higher number of checks when we know the higher trip count will justify the the additional overhead.

[My own requirement] Make sure loopversioning works with non-leaf loops.

Makes sense.

Create a new class LoopWidening.

I feel like this should be an abstract base class with concrete subclasses "LoopVectorizing" and "LoopInterleaving".

There are certainly commonalities between interleaving and vectorization that we may need to be able leverage by delegating the differences to hooks. I guess the way this shapes up will depend on the specifics of how the code will be split out from the Loop Vectorizer. It will initially be probably pretty close to structure of code in the vectorizer.

[Later] create a similar LoopPeeling class, that hopefully should sit on top of the other two.

I am not sure there is much code to be shared between LoopPeeling and Widening. Why do you think so?

The names are up for bikeshedding.

Just for the record, I added the 'ing' ending to LoopVersioning to stress that I mean "version", the verb rather than the noun.

Thanks again,
Adam

Hi Adam,

I'm glad we're converging on a way forward!

I am not sure there is much code to be shared between LoopPeeling and

Widening. Why do you think so?

Commonalities would be cloning all the blocks in a loop, detecting and
hooking up reductions/inductions, and modifying the loop trip count. But
you're right, "on top of" was not the right thing to say.

Cheers,

James

msg-31049-64.txt162 BDownload

Hi Adam,

I've been working on this for the past few days and wanted to give an update. The most interesting/annoying thing about this all is exactly how to split out/share code with the loopvectorizer.

My ideal would be to take the LoopVectorizer code and refactor it piece by piece until it's in a modular enough shape to encapsulate in a utility class. This would mean we'd maintain 100% test coverage throughout the process.

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

This has meant that everything we do in the loop vectorizer happens in one place. If-conversion is done on-the-fly, as are induction PHI updates and remaps. Because of this it's very difficult to isolate one piece of functionality and refactor it - it's all or nothing.

So what I'd like to do is to incrementally create the helper classes LoopWidening/LoopInterleaving/LoopVectorizing separately. In order to get test coverage and move towards an end goal, I'd hook the new helper classes into LoopVectorize and make LoopVectorize use the new classes when possible. Something like this:

 if (!VectorizeLoop) {
  assert(IC > 1 && "interleave count should not be 1 or 0");
  // If we decided that it is not legal to vectorize the loop then
  // interleave it.
  if (canBeHandledByLoopInterleaving()) {
    LoopVersioning LVer(L);
    LVer.versionLoop();
    LoopInterleaving LInt(LVer.getVersionedLoop());
    LInt.widenLoop();
  } else {
    InnerLoopUnroller Unroller(L, SE, LI, DT, TLI, TTI, IC);
    Unroller.vectorize(&LVL);
  }
  emitOptimizationRemark(F->getContext(), DEBUG_TYPE, *F, L->getStartLoc(),
                         Twine("interleaved loop (interleaved count: ") +
                             Twine(IC) + ")");
} else {
  ...

Initially the new codepath would be taken only when the loop is very simple; it has only a single block, for example (or a debug option forced it).

I'm still not 100% happy that I haven't found a way to refactor the code already there :( What do you think? I don't want to spend a massive amount of time going in the wrong direction.

Cheers,

James

Hi James,

Thanks for pushing it forward, I think we really need cleaning this up!

I don't think we'd like to have two independent vectorizers at the same time: 1) the old one, 2) based on the LoopEditor framework and invoked on simple loops (for beginning). The problem with that approach is that you're actually rewriting vectorizer from scratch, which should be really-really last resort. While I do admit that the code might be not ideal in places, I don't think we need to completely rewrite it from scratch. We can't be sure if they will converge in some observable future, and in attempts to converge them we don't know how the LoopEditor implementation will evolve when it covers all corner cases that the existing vectorizer supports.

So, I think the best way is to actually refactor existing code in small incremental steps (and by small I don't mean number of lines changed - I mean number of key points changed). For instance, you introduced a new class Recurrence - why not to do that in the original code? This way the existing code would become closer to what you want in the end, thus your patch will become smaller. I understand that it's not as easy as it sounds, but I think that's the proper way of pulling such changes. Another benefit of such approach is that it would be much easier to review, since NFC changes would be separated from new features, and thus we can both better check NFC changes for possible bugs and discuss the features on a higher level.

That said, in the end I do like to see API similar to what you proposed. I just think that growing it in parallel might back-fire.

Thanks,
Michael

Unfortunately the code in question is a bit of a monolith and has resisted all of my attempts at chiselling into better shape. The main problem is that any sane composition model would perform a sequence of operations on a loop, mutating it in place. This is the model I'd like to get to. But the current model is to do one pass over the source loop, injecting any cloned instructions into a single IRBuilder.

Can you please elaborate on the difficulties a bit more? Perhaps working through an example would help here.

Thanks,
Adam

Hi Adam, Michael,

On Friday I tried again, intending on reporting to you exactly why I failed again... and succeeded :)

I've now got a bunch of patches, culminating in changing LoopVectorize to use LoopVersioning, and I intend to refactor a bunch more too. The patches I've got up for review currently are:

http://reviews.llvm.org/D12284
http://reviews.llvm.org/D12285
http://reviews.llvm.org/D12286
http://reviews.llvm.org/D12289

Those are the ones that aren't quite NFC. I've got a bunch more that just does code cleanup (enabled by those patches) that are trivial.

I still need to throw more testing at these patches, but they all pass regression tests at least.

Cheers,

James

reames resigned from this revision.Oct 8 2015, 10:29 AM

reames removed a reviewer: reames.

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

Transforms/

Utils/

LoopEditor.h

413 lines

lib/

Target/

ARM/

Thumb2ITBlockPass.cpp

1 line

Transforms/

Scalar/

Scalar.cpp

1 line

Utils/

CMakeLists.txt

1 line

LoopEditor.cpp

244 lines

Vectorize/

LoopVectorize.cpp

748 lines

Diff 31326

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
	void initializeRewriteSymbolsPass(PassRegistry&);			void initializeRewriteSymbolsPass(PassRegistry&);
	void initializeWinEHPreparePass(PassRegistry&);			void initializeWinEHPreparePass(PassRegistry&);
	void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);			void initializePlaceBackedgeSafepointsImplPass(PassRegistry&);
	void initializePlaceSafepointsPass(PassRegistry&);			void initializePlaceSafepointsPass(PassRegistry&);
	void initializeDwarfEHPreparePass(PassRegistry&);			void initializeDwarfEHPreparePass(PassRegistry&);
	void initializeFloat2IntPass(PassRegistry&);			void initializeFloat2IntPass(PassRegistry&);
	void initializeLoopDistributePass(PassRegistry&);			void initializeLoopDistributePass(PassRegistry&);
	void initializeSjLjEHPreparePass(PassRegistry&);			void initializeSjLjEHPreparePass(PassRegistry&);
				void initializeLoopEditorTestPass(PassRegistry&);
	}			}

	#endif			#endif

include/llvm/Transforms/Utils/LoopEditor.h

This file was added.

				//===-- LoopEditor.h - High-level loop transformations --------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// The LoopEditor provides a toolkit for performing high-level transforms on
				// loops.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_UTILS_LOOPEDITOR_H
				#define LLVM_TRANSFORMS_SCALAR_UTILS_LOOPEDITOR_H

				#include <llvm/ADT/ArrayRef.h>
				#include <llvm/ADT/DenseMap.h>
				#include <llvm/ADT/SmallVector.h>
				#include <llvm/ADT/Twine.h>
				#include <llvm/IR/IRBuilder.h>
				#include <llvm/Transforms/Utils/ValueMapper.h>
				#include <functional>
				#include <map>

				namespace llvm {

				class BasicBlock;
				class DataLayout;
				class DominatorTree;
				class Loop;
				class LoopInfo;
				class PHINode;
				class SCEV;
				class ScalarEvolution;
				class Value;

				/// \brief This class provides a toolkit to create high-level operations on
				/// loops.
				///
				/// The functionality consists of a set of core functions which are designed
				/// to be composable, and a set of helper routines that stitch together
				/// the core functions to provide higher-level constructions such as loop
				/// versioning, or loop peeling.
				///
				/// Many functions can take a Delegate object, which allows the user to be
				/// notified when different events happen (such as for example "an instruction
				/// has been interleaved"). This allows the core functionality to remain
				/// simple and not need too much configuration, while allowing the user
				/// the flexibility to hook into the internals. Some delegate methods offer
				/// the user the chance to influence the edit operation.
				///
				/// All of the core functions are designed to keep LoopInfo, ScalarEvolution
				/// and DominatorTree up to date.
				class LoopEditor {
				public:
				/// A recurrence is any PHI node in the header block of a loop.
				class Recurrence {
				protected:
				enum RType {
				tySimple, tyReduction, tyInduction
				};
				/// Default constructor purely for operator[] on maps.
				Recurrence() {}
				Recurrence(PHINode BO, Value StartV, PHINode *Val) :
				BasedOn(BO), StartValue(StartV), Val(Val), Ty(tySimple) {
				}
				/// If a loop is cloned, its clone will have this member of each of its
				/// recurrences and inductions set to the original recurrence or induction.
				///
				/// This allows LoopEditor to reason about sets of PHIs and how to merge
				/// them together when stitching the outputs of loops to the input of other
				/// loops (such as in LoopEditor::addIncoming).
				PHINode *BasedOn;
				/// For an induction or a reduction, the "start value" of the PHI (the value
				/// incoming from the preheader).
				Value *StartValue;
				/// The PHInode itself.
				PHINode *Val;
				/// The type of the recurrence.
				RType Ty;
				// This class is deliberately opaque to users, but allow LoopEditor full
				// access.
				friend class LoopEditor;
				public:
				/// Return the start value of this recurrence
				Value *getStartValue() const { return StartValue; }
				/// Return the PHI in the loop header that identifies this recurrence.
				PHINode *getPHI() const { return Val; }
				bool operator < (const Recurrence &R) const {
				return Val < R.Val;
				}
				};

				/// A reduction is a type of recurrence that has one user outside the loop,
				/// and is inductive: R[i] = f(i) OP R[i-1], where OP is an associative
				/// binary operator or min()/max().
				class Reduction : public Recurrence {
				Reduction(Value StartV, PHINode Val) :
				Recurrence(nullptr, StartV, Val) {
				Ty = tyReduction;
				}
				/// Only available for operator[] on maps.
				Reduction() { assert(0 && "Unreachable!"); }
				public:
				/// Emit code to reduce Op1 and Op2 to one value.
				Value createOp(IRBuilder<> &IRB, Value Op1, Value *Op2) const;
				/// Methods for support type inquiry through isa, cast, and dyn_cast:
				static inline bool classof(const Recurrence *R) {
				return R->Ty == tyReduction;
				}
				// Open up to LoopEditor.
				friend class LoopEditor;
				};
				/// A semi-opaque descriptor for an induction. An induction is a recurrence
				/// with a constant inductive step: I[i] = C + I[i-1].
				class Induction : private Recurrence {
				Induction(Value StartV, PHINode Val) :
				Recurrence(nullptr, StartV, Val) {
				Ty = tyInduction;
				}
				/// Only available for operator[] on maps.
				Induction() { assert(0 && "Unreachable!"); }
				public:
				/// Methods for support type inquiry through isa, cast, and dyn_cast:
				static inline bool classof(const Recurrence *R) {
				return R->Ty == tyInduction;
				}

				// Open up to LoopEditor.
				friend class LoopEditor;
				};

				public:
				/// Delegates should be subclassed by users. They can optionally chain
				/// to other delegates for composability.
				///
				/// Delegates provide many callback functions, but only a subset (if any)
				/// will be invoked for any one core function call.
				///
				/// It is forbidden to modify the content of the loop when servicing one
				/// of the "notify*" callbacks.
				struct Delegate {
				Delegate *Next;

				Delegate() : Next(nullptr) {}
				Delegate(Delegate *Next) : Next(Next) {}
				virtual void anchor(); // Provide a home for the vtable.

				/// A call to interleave() is about to happen, for the given iteration.
				/// Iteration starts at 1 (the zero'th iteration would correspond to
				/// the original loop content).
				///
				/// Called by: widenAndInterleave()
				virtual void notifyInterleaveIterationStarting(unsigned Iteration) {
				if (Next) Next->notifyInterleaveIterationStarting(Iteration);
				}
				/// \c OldInst has been cloned to \c NewInst, during a call to
				/// \c interleave()
				///
				/// Called by: interleave()
				virtual void notifyInstructionInterleaved(Instruction *OldInst,
				Instruction *NewInst) {
				if (Next) Next->notifyInstructionInterleaved(OldInst, NewInst);
				}
				/// \c OldInst has been peeled out of the loop as \c NewInst.
				/// The peel iteration was \c Iteration, which starts from zero.
				///
				/// Called by: peelBefore(),peelAfter()
				virtual void notifyInstructionPeeled(Instruction *OldInst,
				Instruction *NewInst,
				unsigned Iteration) {
				if (Next) Next->notifyInstructionPeeled(OldInst, NewInst, Iteration);
				}
				/// During a clone() operation, \c V needs to be cloned. VM contains the
				/// values that have already been cloned (apart from PHI operands which
				/// will be remapped later).
				/// \return The value that V should have in the cloned loop.
				///
				/// Called by: clone()
				virtual Value hookCloneValue(Value V, ValueToValueMapTy &VM,
				IRBuilder<> &IRB) {
				if (Next) return Next->hookCloneValue(V, VM, IRB);
				return nullptr;
				}
				/// When interleaving a loop, each interleaved iteration will require an
				/// offset to be applied to induction variables. This hook defines how
				/// that offset is computed. For example, for an UF=4 interleave:
				/// iteration 0 uses IndVar+0
				/// iteration 1 uses IndVar+S
				/// iteration 2 uses IndVar+S*2
				/// iteration 2 uses IndVar+S*3
				///
				/// Where S is the return value of this hook.
				///
				/// Called by: widenAndInterleave()
				virtual unsigned hookInterleaveInductionStep() {
				if (Next) return Next->hookInterleaveInductionStep();
				return 1;
				}
				/// The reduction \c R, made up of \c Values, needs to be reduced to one
				/// value at the exit of the loop.
				///
				/// Called by: widenAndInterleave()
				virtual Value hookReduceReductions(Reduction R, ArrayRef<Value*> Values,
				IRBuilder<> &IRB) {
				if (Next) return Next->hookReduceReductions(R, Values, IRB);
				return nullptr;
				}
				};

				/// Different transforms require the loop to be analyzable to a different
				/// degree. For example, loop cloning doesn't need to know anything about
				/// the loop, whereas loop interleaving needs to know that all recurrences
				/// are either inductions or reductions.
				///
				/// The AnalysisLevel describes the extent to which the loop was analyzable;
				/// some functions may have minimum analysis requirements.
				enum AnalysisLevel {
				/// The loop was not able to be analyzed at all.
				AL_None,
				/// The loop trip count was able to be determined.
				AL_TripCount,
				/// The loop contains a single backedge and is in LCSSA form.
				AL_LCSSA,
				/// All recurrences were categorized (into inductions and reductions),
				/// as well as the trip count being identified.
				AL_AllRecurrences
				};

				/// Contructs a new loop editor, editing L.
				LoopEditor(Loop L, ScalarEvolution SE, const DataLayout DL, LoopInfo LI,
				DominatorTree *DT);

				///
				/// Queries
				///

				/// Returns the level to which the loop was able to be analyzed.
				AnalysisLevel getAnalysisLevel() const;

				/// Returns true if I is contained within blocks the loop editor owns.
				/// This is equivalent to Loop::contains(), but it also checks the optional
				/// bypass and dedicated exit blocks that are outside of Loop's knowledge.
				bool contains(Instruction *I);

				/// Allow accesses to the underlying Loop using ->.
				Loop *operator -> () const { return L; }

				/// Return the recurrence referred to by this PHI, if any. A recurrence is
				/// referred to by its PHI in the loop header block. Returns nullptr on
				/// failure.
				Recurrence getRecurrenceByPHI(PHINode V);

				/// Retrieves all known recurrences.
				ArrayRef<Recurrence*> getAllRecurrences();

				/// Returns the value of the given recurrence on exiting the loop.
				Value *getOutgoingRecurrence(const Recurrence &R);

				/// Retrieves a Recurrence valid for this loop, if one is found to be based
				/// on or derived from the given recurrence. This will happen if this loop has
				/// been cloned from another loop previously.
				Recurrence *getMatchingRecurrence(const Recurrence &R);

				/// Retrieves the executed trip count at the end of the loop. This may be
				/// zero if the loop was bypassed by a predicate, or less than the normal
				/// trip count if the loop was widened.
				///
				/// Requires AnalysisLevel AL_TripCount or above.
				Value *getExecutedTripCount();

				/// Retrieves the trip count of the loop as a SCEV. Note that this involves
				/// adding one to the backedge count and as such may overflow when expanded.
				///
				/// Requires AnalysisLevel AL_TripCount or above.
				const SCEV *getTripCount();

				///
				/// Mutations
				///

				/// Produce a new version of this loop. The new loop is returned in
				/// LoopEditor.
				///
				/// The new loop is inserted into the CFG on IncomingEdge (IncomingEdge is
				/// updated), so if IncomingEdge is A -> B, the result of this function will
				/// be A -> LOOP -> B.
				LoopEditor clone(Use &IncomingEdge, const Twine &NameSuffix = "");

				/// A "widened" loop iterates over the same range but with a wider step.
				/// For example the AddRec {5,1}<%L> widened by a factor 4 becomes {5,4}<%L>.
				///
				/// This is a requirement for loop interleaving and loop vectorization, but
				/// importantly this does not actually do any unrolling or vectorization.
				/// Instead, all induction variables have their steps altered and the trip
				/// count is modified.
				///
				/// Reductions are not altered. Requires AnalysisLevel AL_AllRecurrences.
				void widen(unsigned Factor);

				/// Set the loop trip count. If MayOverflow is set, then an overflow check
				/// will be emitted to ensure that TripCount is not zero.
				///
				/// The overflow check may be omitted if the LoopEditor can prove that
				/// TripCount is less than the current loop trip count, and there is currently
				/// no overflow check.
				///
				/// Requires AnalysisLevel AL_TripCount.
				void setTripCount(const SCEV *TripCount, bool MayOverflow=true);

				/// Adds a new reduction variable starting at StartValue. The reduction does
				/// not do anything until it is connected with connectReduction().
				///
				/// Requires AnalysisLevel AL_LCSSA.
				Reduction addReduction(Value StartValue);

				/// Connects the backedge of a reduction added with addReduction().
				///
				/// Requires AnalysisLevel AL_LCSSA.
				void connectReduction(const Reduction ID, Value V);

				/// Creates a new copy of all instructions in the loop body, except control flow
				/// instructions. Does not modify the control flow in the loop at all.
				///
				/// The inserted instructions are interleaved. Requires AnalysisLevel
				/// AL_AllRecurrences.
				///
				/// Invokes the following functions on the delegate:
				/// hookInductionVariableUse(Induction &Induction, IRBuilder<> IRB)
				/// Called when a use of an induction variable is encountered. Expected
				/// to return the value to use as an induction variable.
				///
				/// hookReductionVariableUse(Reduction &Reduction, IRBuilder<> IRB)
				/// Called when a use of a reduction variable is encountered. Expected
				/// to return the value to use as a reduction.
				///
				/// notifyReductionVariableDefined(Reduction &Reduction, Value *Def)
				/// Called when a feed of a reduction variable is seen. The arguments are
				/// the newly created Value and the Reduction it fed in the original loop.
				void interleave(Delegate *D);

				/// Returns a dedicated exiting block, creating one if it does not exist. A
				/// dedicated exiting block is one where the only incoming arcs are from
				/// the loop latch or optionally the bypass block (The bypass block is a
				/// check to see if the loop would actually be executed at least once, as
				/// the trip count check is on the backedge).
				BasicBlock *getOrCreateDedicatedExitingBlock();

				/// Changes the block that is executed after this loop.
				void updateExitBlock(BasicBlock *BB);

				/// Add an incoming edge from the loop L, with the given mappings between our
				/// and L's recurrences
				///
				/// The Recurrences can either reference this loop or L's.
				void addIncoming(LoopEditor &L,
				const std::map<Recurrence, Value> &Recurrences);

				/// Add an incoming edge from the basic block BB.
				void addIncoming(BasicBlock *BB);

				/// Remove an incoming edge from BB. The edge must already exist.
				void removeIncoming(BasicBlock *BB);

				///
				/// Macro-mutations - These are all helper functions implemented in terms of
				/// the mutations above.
				///

				/// Widens the loop by \c Factor and performs interleaved loop unrolling.
				///
				/// The reductions in the original loop get duplicated \c Factor times,
				/// then reduced at the end of the loop into one value and returned in Ret.
				void widenAndInterleaveLoop(unsigned Factor,
				std::map<Reduction, Value> &Ret,
				Delegate *D=nullptr);

				/// Versions, widens, and interleaves the loop by \c Factor.
				///
				/// The original loop is cloned and a widened loop created, which branches
				/// to the original scalar loop as a tail.
				///
				/// PredBB is created as a block to determine whether to branch to the widened
				/// loop or the scalar one (independent of any trip count and overflow checks).
				/// It is initially set up with just one branch:
				/// br i1 true, label %versioned-loop, label %scalar-loop
				LoopEditor versionWidenAndInterleaveLoop(unsigned Factor, BasicBlock *&PredBB,
				Delegate *D=nullptr);

				/// Peels \c Iterations iterations out of the loop and places them before the
				/// loop. The trip count of the loop is decreased to compensate.
				void peelBefore(unsigned Iterations, Delegate *D=nullptr);

				/// Peels \c Iterations iterations out of the loop and places them after the
				/// loop. The trip count of the loop is decreased to compensate.
				void peelAfter(unsigned Iterations, Delegate *D=nullptr);

				private:
				//
				// Immutable/analysis state
				//
				Loop *L;
				ScalarEvolution *SE;
				const DataLayout *DL;
				LoopInfo *LI;
				DominatorTree *DT;
				bool Analyzed;
				};
				} // end namespace llvm

				#endif

lib/Target/ARM/Thumb2ITBlockPass.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	bool Thumb2ITBlockPass::InsertITInstructions(MachineBasicBlock &MBB) {
SmallSet<unsigned, 4> Defs;		SmallSet<unsigned, 4> Defs;
SmallSet<unsigned, 4> Uses;		SmallSet<unsigned, 4> Uses;
MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();		MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
while (MBBI != E) {		while (MBBI != E) {
MachineInstr MI = &MBBI;		MachineInstr MI = &MBBI;
DebugLoc dl = MI->getDebugLoc();		DebugLoc dl = MI->getDebugLoc();
unsigned PredReg = 0;		unsigned PredReg = 0;
ARMCC::CondCodes CC = getITInstrPredicate(MI, PredReg);		ARMCC::CondCodes CC = getITInstrPredicate(MI, PredReg);

		rengolinUnsubmitted Not Done Reply Inline Actions oops rengolin: oops
if (CC == ARMCC::AL) {		if (CC == ARMCC::AL) {
++MBBI;		++MBBI;
continue;		continue;
}		}

Defs.clear();		Defs.clear();
Uses.clear();		Uses.clear();
TrackDefUses(MI, Defs, Uses, TRI);		TrackDefUses(MI, Defs, Uses, TRI);
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeSeparateConstOffsetFromGEPPass(Registry);		initializeSeparateConstOffsetFromGEPPass(Registry);
initializeSpeculativeExecutionPass(Registry);		initializeSpeculativeExecutionPass(Registry);
initializeStraightLineStrengthReducePass(Registry);		initializeStraightLineStrengthReducePass(Registry);
initializeLoadCombinePass(Registry);		initializeLoadCombinePass(Registry);
initializePlaceBackedgeSafepointsImplPass(Registry);		initializePlaceBackedgeSafepointsImplPass(Registry);
initializePlaceSafepointsPass(Registry);		initializePlaceSafepointsPass(Registry);
initializeFloat2IntPass(Registry);		initializeFloat2IntPass(Registry);
initializeLoopDistributePass(Registry);		initializeLoopDistributePass(Registry);
		initializeLoopEditorTestPass(Registry);
}		}

void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {		void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {
initializeScalarOpts(*unwrap(R));		initializeScalarOpts(*unwrap(R));
}		}

void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {		void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createAggressiveDCEPass());		unwrap(PM)->add(createAggressiveDCEPass());
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

lib/Transforms/Utils/CMakeLists.txt

Show All 12 Lines	add_llvm_library(LLVMTransformUtils
DemoteRegToStack.cpp		DemoteRegToStack.cpp
FlattenCFG.cpp		FlattenCFG.cpp
GlobalStatus.cpp		GlobalStatus.cpp
InlineFunction.cpp		InlineFunction.cpp
InstructionNamer.cpp		InstructionNamer.cpp
IntegerDivision.cpp		IntegerDivision.cpp
LCSSA.cpp		LCSSA.cpp
Local.cpp		Local.cpp
		LoopEditor.cpp
LoopSimplify.cpp		LoopSimplify.cpp
LoopUnroll.cpp		LoopUnroll.cpp
LoopUnrollRuntime.cpp		LoopUnrollRuntime.cpp
LoopUtils.cpp		LoopUtils.cpp
LoopVersioning.cpp		LoopVersioning.cpp
LowerInvoke.cpp		LowerInvoke.cpp
LowerSwitch.cpp		LowerSwitch.cpp
Mem2Reg.cpp		Mem2Reg.cpp
Show All 19 Lines

lib/Transforms/Utils/LoopEditor.cpp

This file was added.

				//===-- LoopEditor.cpp - High-level loop transformations ------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// The LoopEditor provides a toolkit for performing high-level transforms on
				// loops.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/LoopEditor.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Value.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/PrettyStackTrace.h"

				#define DEBUG_TYPE "loopeditor"
				using namespace llvm;

				void LoopEditor::Delegate::anchor() {}

				//
				// Helper functions.
				//
				Value LoopEditor::Reduction::createOp(IRBuilder<> &IRB, Value Op1, Value *Op2) const {
				assert(0 && "Implement!");
				}

				//
				// Public API
				//

				LoopEditor::LoopEditor(Loop L, ScalarEvolution SE, const DataLayout *DL,
				LoopInfo LI, DominatorTree DT)
				: L(L), SE(SE), DL(DL), LI(LI), DT(DT) {
				}

				LoopEditor::AnalysisLevel LoopEditor::getAnalysisLevel() const {
				assert(0 && "Not implemented!");
				return AL_None;
				}

				bool LoopEditor::contains(Instruction *I) {
				assert(0 && "Not implemented!");
				return false;
				}

				LoopEditor::Recurrence LoopEditor::getRecurrenceByPHI(PHINode V) {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				ArrayRef<LoopEditor::Recurrence*> LoopEditor::getAllRecurrences() {
				assert(0 && "Not implemented!");
				return ArrayRef<LoopEditor::Recurrence*>();
				}

				Value *LoopEditor::getOutgoingRecurrence(const Recurrence &R) {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				LoopEditor::Recurrence *LoopEditor::getMatchingRecurrence(const Recurrence &R) {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				Value *LoopEditor::getExecutedTripCount() {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				const SCEV *LoopEditor::getTripCount() {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				void LoopEditor::setTripCount(const SCEV *TC, bool MayOverflow) {
				assert(0 && "Not implemented!");
				}

				LoopEditor LoopEditor::clone(Use &IncomingEdge,
				const Twine &NameSuffix) {
				assert(0 && "Not implemented!");
				return *this;
				}

				void LoopEditor::widen(unsigned Factor) {
				assert(0 && "Not implemented!");
				}

				LoopEditor::Reduction LoopEditor::addReduction(Value StartValue) {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				void LoopEditor::connectReduction(const Reduction ID, Value V) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::interleave(Delegate *D) {
				assert(0 && "Not implemented!");
				}

				BasicBlock *LoopEditor::getOrCreateDedicatedExitingBlock() {
				assert(0 && "Not implemented!");
				return nullptr;
				}

				void LoopEditor::updateExitBlock(BasicBlock *BB) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::addIncoming(
				LoopEditor &LE,
				const std::map<Recurrence, Value> &Recurrences) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::addIncoming(BasicBlock *BB) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::removeIncoming(BasicBlock *BB) {
				assert(0 && "Not implemented!");
				}

				//
				// Macro-mutators
				//

				void LoopEditor::widenAndInterleaveLoop(unsigned Factor,
				std::map<Reduction, Value> &Ret,
				Delegate *D) {
				assert(0 && "Not implemented!");
				}

				LoopEditor LoopEditor::versionWidenAndInterleaveLoop(unsigned Factor,
				BasicBlock *&PredBB,
				Delegate *D) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::peelBefore(unsigned Iterations, Delegate *D) {
				assert(0 && "Not implemented!");
				}

				void LoopEditor::peelAfter(unsigned Iterations, Delegate *D) {
				assert(0 && "Not implemented!");
				}

				//
				// Dummy test pass
				//
				enum TestCommand {
				TC_Clone
				};
				cl::opt<TestCommand> TestCommand(
				cl::desc("Loop editor test command"), cl::Hidden, cl::values(
				clEnumValN(TC_Clone, "clone", "Test the ::clone() function"),
				clEnumValEnd));
				cl::opt<std::string> TestLoop(cl::desc("Loop editor loop to operate on"),
				cl::Hidden, cl::init("TestLoop"));
				cl::opt<std::string> TestEdge(cl::desc("Loop editor edge to clone on"),
				cl::Hidden, cl::init("TestEdge"));

				struct LoopEditorTest : FunctionPass {
				static char ID;
				LoopEditorTest() : FunctionPass(ID) {
				initializeLoopEditorTestPass(*PassRegistry::getPassRegistry());
				}
				bool runOnFunction(Function &F) override;
				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<ScalarEvolution>();
				}
				};
				char LoopEditorTest::ID = 0;
				INITIALIZE_PASS_BEGIN(LoopEditorTest, "test-loop-editor",
				"Dummy test pass to test LoopEditor",
				false, false)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
				INITIALIZE_PASS_END(LoopEditorTest, "test-loop-editor",
				"Dummy test pass to test LoopEditor",
				false, false)

				bool LoopEditorTest::runOnFunction(Function &F) {
				LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
				DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				ScalarEvolution &SE = getAnalysis<ScalarEvolution>();

				auto I = std::find_if(F.begin(), F.end(), [](const BasicBlock &BB) {
				return BB.getName() == TestLoop;
				});
				if (I == F.end())
				report_fatal_error("Could not find basicblock %" + TestLoop +
				" in function @" + F.getName(), false);

				Loop *L = LI.getLoopFor(I);
				if (!L)
				report_fatal_error("Could not find loop for BB %" + TestLoop +
				" in function @" + F.getName(), false);

				LoopEditor LE(L, &SE, &F.getParent()->getDataLayout(), &LI, &DT);
				switch (TestCommand) {
				case TC_Clone: {
				auto I = std::find_if(F.begin(), F.end(), [](const BasicBlock &BB) {
				return BB.getName() == TestEdge;
				});
				if (I == F.end())
				report_fatal_error("Could not find test edge %" + TestEdge, false);
				if (I->size() != 1)
				report_fatal_error("Test edge %" + TestEdge + " does not contain solely"
				" an unconditional branch!", false);
				BranchInst *BI = dyn_cast<BranchInst>(I->getTerminator());
				if (!BI \|\| BI->isConditional())
				report_fatal_error("Test edge %" + TestEdge + " was not an unconditional"
				" branch!", false);
				Use &Edge = *BI->op_begin();
				LE.clone(Edge, ".cloned");
				}
				}

				return true;
				}

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
		#include "llvm/Transforms/Utils/LoopEditor.h"
#include <algorithm>		#include <algorithm>
#include <map>		#include <map>
#include <tuple>		#include <tuple>

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define LV_NAME "loop-vectorize"		#define LV_NAME "loop-vectorize"
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	public:

// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool IsSafetyChecksAdded() {		bool IsSafetyChecksAdded() {
return AddedSafetyChecks;		return AddedSafetyChecks;
}		}

virtual ~InnerLoopVectorizer() {}		virtual ~InnerLoopVectorizer() {}

protected:		public:
/// A small list of PHINodes.		/// A small list of PHINodes.
typedef SmallVector<PHINode*, 4> PhiVector;		typedef SmallVector<PHINode*, 4> PhiVector;
/// When we unroll loops we have multiple vector values for each scalar.		/// When we unroll loops we have multiple vector values for each scalar.
/// This data structure holds the unrolled and vectorized values that		/// This data structure holds the unrolled and vectorized values that
/// originated from one scalar instruction.		/// originated from one scalar instruction.
typedef SmallVector<Value*, 2> VectorParts;		typedef SmallVector<Value*, 2> VectorParts;

// When we if-convert we need to create edge masks. We have to cache values		// When we if-convert we need to create edge masks. We have to cache values
Show All 23 Lines	public:
/// mask for the block BB.		/// mask for the block BB.
VectorParts createBlockInMask(BasicBlock *BB);		VectorParts createBlockInMask(BasicBlock *BB);
/// A helper function that computes the predicate of the edge between SRC		/// A helper function that computes the predicate of the edge between SRC
/// and DST.		/// and DST.
VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);		VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);

/// A helper function to vectorize a single BB within the innermost loop.		/// A helper function to vectorize a single BB within the innermost loop.
void vectorizeBlockInLoop(BasicBlock BB, PhiVector PV);		void vectorizeBlockInLoop(BasicBlock BB, PhiVector PV);
		void vectorizeSingleInstruction(Instruction I, PhiVector PV);

/// Vectorize a single PHINode in a block. This method handles the induction		/// Vectorize a single PHINode in a block. This method handles the induction
/// variable canonicalization. It supports both VF = 1 for unrolled loops and		/// variable canonicalization. It supports both VF = 1 for unrolled loops and
/// arbitrary length vectors.		/// arbitrary length vectors.
void widenPHIInstruction(Instruction *PN, VectorParts &Entry,		void widenPHIInstruction(Instruction *PN, VectorParts &Entry,
unsigned UF, unsigned VF, PhiVector *PV);		unsigned UF, unsigned VF, PhiVector *PV);

/// Insert the new loop to the loop hierarchy and pass manager		/// Insert the new loop to the loop hierarchy and pass manager
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	public:
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
/// Target Transform Info.		/// Target Transform Info.
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;

/// The vectorization SIMD factor to use. Each vector will have this many		/// The vectorization SIMD factor to use. Each vector will have this many
/// vector elements.		/// vector elements.
unsigned VF;		unsigned VF;

protected:		friend class VectorizerDelegate;
		public:
/// The vectorization unroll factor to use. Each scalar is vectorized to this		/// The vectorization unroll factor to use. Each scalar is vectorized to this
/// many different vector instructions.		/// many different vector instructions.
unsigned UF;		unsigned UF;

/// The builder that we use		/// The builder that we use
IRBuilder<> Builder;		IRBuilder<> Builder;

// --- Vectorization state ---		// --- Vectorization state ---
▲ Show 20 Lines • Show All 2,006 Lines • ▼ Show 20 Lines	Instruction *TheCheck =
BinaryOperator::CreateAnd(Check, ConstantInt::getTrue(Ctx));		BinaryOperator::CreateAnd(Check, ConstantInt::getTrue(Ctx));
ChkBuilder.Insert(TheCheck, "stride.not.one");		ChkBuilder.Insert(TheCheck, "stride.not.one");
FirstInst = getFirstInst(FirstInst, TheCheck, Loc);		FirstInst = getFirstInst(FirstInst, TheCheck, Loc);

return std::make_pair(FirstInst, TheCheck);		return std::make_pair(FirstInst, TheCheck);
}		}

void InnerLoopVectorizer::createEmptyLoop() {		void InnerLoopVectorizer::createEmptyLoop() {
/*		LoopVectorizeHints Hints(OrigLoop, true);
In this function we generate a new loop. The new loop will contain
the vectorized instructions while the old loop will continue to run the
scalar remainder.

[ ] <-- Back-edge taken count overflow check.
/ \|
/ v
\| [ ] <-- vector loop bypass (may consist of multiple blocks).
\| / \|
\| / v
\|\| [ ] <-- vector pre header.
\|\| \|
\|\| v
\|\| [ ] \
\|\| [ ]_\| <-- vector loop.
\|\| \|
\| \ v
\| >[ ] <--- middle-block.
\| / \|
\| / v
-\|- >[ ] <--- new preheader.
\| \|
\| v
\| [ ] \
\| [ ]_\| <-- old scalar loop to handle remainder.
\ \|
\ v
>[ ] <-- exit block.
...
*/

BasicBlock *OldBasicBlock = OrigLoop->getHeader();
BasicBlock *VectorPH = OrigLoop->getLoopPreheader();
BasicBlock *ExitBlock = OrigLoop->getExitBlock();
assert(VectorPH && "Invalid loop structure");
assert(ExitBlock && "Must have an exit block");

// Some loops have a single integer induction variable, while other loops
// don't. One example is c++ iterators that often have multiple pointer
// induction variables. In the code below we also support a case where we
// don't have a single induction variable.
OldInduction = Legal->getInduction();
Type *IdxTy = Legal->getWidestInductionType();

// Find the loop boundaries.
const SCEV *ExitCount = SE->getBackedgeTakenCount(OrigLoop);
assert(ExitCount != SE->getCouldNotCompute() && "Invalid loop count");

// The exit count might have the type of i64 while the phi is i32. This can
// happen if we have an induction variable that is sign extended before the
// compare. The only way that we get a backedge taken count is that the
// induction variable was signed and as such will not overflow. In such a case
// truncation is legal.
if (ExitCount->getType()->getPrimitiveSizeInBits() >
IdxTy->getPrimitiveSizeInBits())
ExitCount = SE->getTruncateOrNoop(ExitCount, IdxTy);

const SCEV *BackedgeTakeCount = SE->getNoopOrZeroExtend(ExitCount, IdxTy);
// Get the total trip count from the count by adding 1.
ExitCount = SE->getAddExpr(BackedgeTakeCount,
SE->getConstant(BackedgeTakeCount->getType(), 1));

const DataLayout &DL = OldBasicBlock->getModule()->getDataLayout();

// Expand the trip count and place the new instructions in the preheader.
// Notice that the pre-header does not change, only the loop body.
SCEVExpander Exp(*SE, DL, "induction");

// We need to test whether the backedge-taken count is uint##_max. Adding one
// to it will cause overflow and an incorrect loop trip count in the vector
// body. In case of overflow we want to directly jump to the scalar remainder
// loop.
Value *BackedgeCount =
Exp.expandCodeFor(BackedgeTakeCount, BackedgeTakeCount->getType(),
VectorPH->getTerminator());
if (BackedgeCount->getType()->isPointerTy())
BackedgeCount = CastInst::CreatePointerCast(BackedgeCount, IdxTy,
"backedge.ptrcnt.to.int",
VectorPH->getTerminator());
Instruction *CheckBCOverflow =
CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ, BackedgeCount,
Constant::getAllOnesValue(BackedgeCount->getType()),
"backedge.overflow", VectorPH->getTerminator());

// The loop index does not have to start at Zero. Find the original start
// value from the induction PHI node. If we don't have an induction variable
// then we know that it starts at zero.
Builder.SetInsertPoint(VectorPH->getTerminator());
Value *StartIdx = ExtendedIdx =
OldInduction
? Builder.CreateZExt(OldInduction->getIncomingValueForBlock(VectorPH),
IdxTy)
: ConstantInt::get(IdxTy, 0);

// Count holds the overall loop count (N).
Value *Count = Exp.expandCodeFor(ExitCount, ExitCount->getType(),
VectorPH->getTerminator());

LoopBypassBlocks.push_back(VectorPH);

// Split the single block loop into the two loop structure described above.
BasicBlock *VecBody =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.body");
BasicBlock *MiddleBlock =
VecBody->splitBasicBlock(VecBody->getTerminator(), "middle.block");
BasicBlock *ScalarPH =
MiddleBlock->splitBasicBlock(MiddleBlock->getTerminator(), "scalar.ph");

// Create and register the new vector loop.
Loop* Lp = new Loop();
Loop *ParentLoop = OrigLoop->getParentLoop();

// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.
if (ParentLoop) {
ParentLoop->addChildLoop(Lp);
ParentLoop->addBasicBlockToLoop(ScalarPH, *LI);
ParentLoop->addBasicBlockToLoop(MiddleBlock, *LI);
} else {
LI->addTopLevelLoop(Lp);
}
Lp->addBasicBlockToLoop(VecBody, *LI);

// Use this IR builder to create the loop instructions (Phi, Br, Cmp)
// inside the loop.
Builder.SetInsertPoint(VecBody->getFirstNonPHI());

// Generate the induction variable.
setDebugLocFromInst(Builder, getDebugLocFromInstOrOperands(OldInduction));
Induction = Builder.CreatePHI(IdxTy, 2, "index");
// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).
Constant Step = ConstantInt::get(IdxTy, VF UF);

// Generate code to check that the loop's trip count that we computed by
// adding one to the backedge-taken count will not overflow.
BasicBlock *NewVectorPH =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "overflow.checked");
if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);
ReplaceInstWithInst(
VectorPH->getTerminator(),
BranchInst::Create(ScalarPH, NewVectorPH, CheckBCOverflow));
VectorPH = NewVectorPH;

// This is the IR builder that we use to add all of the logic for bypassing
// the new vector loop.
IRBuilder<> BypassBuilder(VectorPH->getTerminator());
setDebugLocFromInst(BypassBuilder,
getDebugLocFromInstOrOperands(OldInduction));

// We may need to extend the index in case there is a type mismatch.
// We know that the count starts at zero and does not overflow.
if (Count->getType() != IdxTy) {
// The exit count can be of pointer type. Convert it to the correct
// integer type.
if (ExitCount->getType()->isPointerTy())
Count = BypassBuilder.CreatePointerCast(Count, IdxTy, "ptrcnt.to.int");
else
Count = BypassBuilder.CreateZExtOrTrunc(Count, IdxTy, "cnt.cast");
}

// Add the start index to the loop count to get the new end index.
Value *IdxEnd = BypassBuilder.CreateAdd(Count, StartIdx, "end.idx");

// Now we need to generate the expression for N - (N % VF), which is
// the part that the vectorized body will execute.
Value *R = BypassBuilder.CreateURem(Count, Step, "n.mod.vf");
Value *CountRoundDown = BypassBuilder.CreateSub(Count, R, "n.vec");
Value *IdxEndRoundDown = BypassBuilder.CreateAdd(CountRoundDown, StartIdx,
"end.idx.rnd.down");

// Now, compare the new count to zero. If it is zero skip the vector loop and
// jump to the scalar loop.
Value *Cmp =
BypassBuilder.CreateICmpEQ(IdxEndRoundDown, StartIdx, "cmp.zero");
NewVectorPH =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.ph");
if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);
LoopBypassBlocks.push_back(VectorPH);
ReplaceInstWithInst(VectorPH->getTerminator(),
BranchInst::Create(MiddleBlock, NewVectorPH, Cmp));
VectorPH = NewVectorPH;

// Generate the code to check that the strides we assumed to be one are really
// one. We want the new basic block to start at the first instruction in a
// sequence of instructions that form a check.
Instruction *StrideCheck;
Instruction *FirstCheckInst;
std::tie(FirstCheckInst, StrideCheck) =
addStrideCheck(VectorPH->getTerminator());
if (StrideCheck) {
AddedSafetyChecks = true;
// Create a new block containing the stride check.
VectorPH->setName("vector.stridecheck");
NewVectorPH =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.ph");
if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);
LoopBypassBlocks.push_back(VectorPH);

// Replace the branch into the memory check block with a conditional branch
// for the "few elements case".
ReplaceInstWithInst(
VectorPH->getTerminator(),
BranchInst::Create(MiddleBlock, NewVectorPH, StrideCheck));

VectorPH = NewVectorPH;
}

// Generate the code that checks in runtime if arrays overlap. We put the
// checks into a separate block to make the more common case of few elements
// faster.
Instruction *MemRuntimeCheck;
std::tie(FirstCheckInst, MemRuntimeCheck) =
Legal->getLAI()->addRuntimeCheck(VectorPH->getTerminator());
if (MemRuntimeCheck) {
AddedSafetyChecks = true;
// Create a new block containing the memory check.
VectorPH->setName("vector.memcheck");
NewVectorPH =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.ph");
if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);
LoopBypassBlocks.push_back(VectorPH);

// Replace the branch into the memory check block with a conditional branch
// for the "few elements case".
ReplaceInstWithInst(
VectorPH->getTerminator(),
BranchInst::Create(MiddleBlock, NewVectorPH, MemRuntimeCheck));

VectorPH = NewVectorPH;
}

// We are going to resume the execution of the scalar loop.
// Go over all of the induction variables that we found and fix the
// PHIs that are left in the scalar version of the loop.
// The starting values of PHI nodes depend on the counter of the last
// iteration in the vectorized loop.
// If we come from a bypass edge then we need to start from the original
// start value.

// This variable saves the new starting index for the scalar loop.
PHINode *ResumeIndex = nullptr;
LoopVectorizationLegality::InductionList::iterator I, E;
LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
// Set builder to point to last bypass block.
BypassBuilder.SetInsertPoint(LoopBypassBlocks.back()->getTerminator());
for (I = List->begin(), E = List->end(); I != E; ++I) {
PHINode *OrigPhi = I->first;
LoopVectorizationLegality::InductionInfo II = I->second;

Type *ResumeValTy = (OrigPhi == OldInduction) ? IdxTy : OrigPhi->getType();
PHINode *ResumeVal = PHINode::Create(ResumeValTy, 2, "resume.val",
MiddleBlock->getTerminator());
// We might have extended the type of the induction variable but we need a
// truncated version for the scalar loop.
PHINode *TruncResumeVal = (OrigPhi == OldInduction) ?
PHINode::Create(OrigPhi->getType(), 2, "trunc.resume.val",
MiddleBlock->getTerminator()) : nullptr;

// Create phi nodes to merge from the backedge-taken check block.
PHINode *BCResumeVal = PHINode::Create(ResumeValTy, 3, "bc.resume.val",
ScalarPH->getTerminator());
BCResumeVal->addIncoming(ResumeVal, MiddleBlock);

PHINode *BCTruncResumeVal = nullptr;
if (OrigPhi == OldInduction) {
BCTruncResumeVal =
PHINode::Create(OrigPhi->getType(), 2, "bc.trunc.resume.val",
ScalarPH->getTerminator());
BCTruncResumeVal->addIncoming(TruncResumeVal, MiddleBlock);
}

Value *EndValue = nullptr;
switch (II.IK) {
case LoopVectorizationLegality::IK_NoInduction:
llvm_unreachable("Unknown induction");
case LoopVectorizationLegality::IK_IntInduction: {
// Handle the integer induction counter.
assert(OrigPhi->getType()->isIntegerTy() && "Invalid type");

// We have the canonical induction variable.
if (OrigPhi == OldInduction) {
// Create a truncated version of the resume value for the scalar loop,
// we might have promoted the type to a larger width.
EndValue =
BypassBuilder.CreateTrunc(IdxEndRoundDown, OrigPhi->getType());
// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
TruncResumeVal->addIncoming(II.StartValue, LoopBypassBlocks[I]);
TruncResumeVal->addIncoming(EndValue, VecBody);

BCTruncResumeVal->addIncoming(II.StartValue, LoopBypassBlocks[0]);

// We know what the end value is.
EndValue = IdxEndRoundDown;
// We also know which PHI node holds it.
ResumeIndex = ResumeVal;
break;
}

// Not the canonical induction variable - add the vector loop count to the
// start value.
Value *CRD = BypassBuilder.CreateSExtOrTrunc(CountRoundDown,
II.StartValue->getType(),
"cast.crd");
EndValue = II.transform(BypassBuilder, CRD);
EndValue->setName("ind.end");
break;
}
case LoopVectorizationLegality::IK_PtrInduction: {
Value *CRD = BypassBuilder.CreateSExtOrTrunc(CountRoundDown,
II.StepValue->getType(),
"cast.crd");
EndValue = II.transform(BypassBuilder, CRD);
EndValue->setName("ptr.ind.end");
break;
}
}// end of case

// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I) {
if (OrigPhi == OldInduction)
ResumeVal->addIncoming(StartIdx, LoopBypassBlocks[I]);
else
ResumeVal->addIncoming(II.StartValue, LoopBypassBlocks[I]);
}
ResumeVal->addIncoming(EndValue, VecBody);

// Fix the scalar body counter (PHI node).
unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);

// The old induction's phi node in the scalar body needs the truncated
// value.
if (OrigPhi == OldInduction) {
BCResumeVal->addIncoming(StartIdx, LoopBypassBlocks[0]);
OrigPhi->setIncomingValue(BlockIdx, BCTruncResumeVal);
} else {
BCResumeVal->addIncoming(II.StartValue, LoopBypassBlocks[0]);
OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
}
}

// If we are generating a new induction variable then we also need to
// generate the code that calculates the exit value. This value is not
// simply the end of the counter because we may skip the vectorized body
// in case of a runtime check.
if (!OldInduction){
assert(!ResumeIndex && "Unexpected resume value found");
ResumeIndex = PHINode::Create(IdxTy, 2, "new.indc.resume.val",
MiddleBlock->getTerminator());
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
ResumeIndex->addIncoming(StartIdx, LoopBypassBlocks[I]);
ResumeIndex->addIncoming(IdxEndRoundDown, VecBody);
}

// Make sure that we found the index where scalar loop needs to continue.
assert(ResumeIndex && ResumeIndex->getType()->isIntegerTy() &&
"Invalid resume Index");

// Add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop.
// If (N - N%VF) == N, then we don't need to run the remainder.
Value *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ, IdxEnd,
ResumeIndex, "cmp.n",
MiddleBlock->getTerminator());
ReplaceInstWithInst(MiddleBlock->getTerminator(),
BranchInst::Create(ExitBlock, ScalarPH, CmpN));

// Create i+1 and fill the PHINode.
Value *NextIdx = Builder.CreateAdd(Induction, Step, "index.next");
Induction->addIncoming(StartIdx, VectorPH);
Induction->addIncoming(NextIdx, VecBody);
// Create the compare.
Value *ICmp = Builder.CreateICmpEQ(NextIdx, IdxEndRoundDown);
Builder.CreateCondBr(ICmp, MiddleBlock, VecBody);

// Now we have two terminators. Remove the old one from the block.
VecBody->getTerminator()->eraseFromParent();

// Get ready to start creating new instructions into the vectorized body.
Builder.SetInsertPoint(VecBody->getFirstInsertionPt());

// Save the state.
LoopVectorPreHeader = VectorPH;
LoopScalarPreHeader = ScalarPH;
LoopMiddleBlock = MiddleBlock;
LoopExitBlock = ExitBlock;
LoopVectorBody.push_back(VecBody);
LoopScalarBody = OldBasicBlock;

LoopVectorizeHints Hints(Lp, true);
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();
}		}

namespace {		namespace {
struct CSEDenseMapInfo {		struct CSEDenseMapInfo {
static bool canHandle(Instruction *I) {		static bool canHandle(Instruction *I) {
return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|		return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|
isa<ShuffleVectorInst>(I) \|\| isa<GetElementPtrInst>(I);		isa<ShuffleVectorInst>(I) \|\| isa<GetElementPtrInst>(I);
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	static unsigned getVectorIntrinsicCost(CallInst *CI, unsigned VF,
Type *RetTy = ToVectorTy(CI->getType(), VF);		Type *RetTy = ToVectorTy(CI->getType(), VF);
SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));		Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));

return TTI.getIntrinsicInstrCost(ID, RetTy, Tys);		return TTI.getIntrinsicInstrCost(ID, RetTy, Tys);
}		}

void InnerLoopVectorizer::vectorizeLoop() {		class VectorizerDelegate : public LoopEditor::Delegate {
//===------------------------------------------------===//		InnerLoopVectorizer &LV;
//		public:
// Notice: any optimization or new instruction that go		VectorizerDelegate(InnerLoopVectorizer &LV) : LV(LV) {
// into the code below should be also be implemented in		};
// the cost-model.
//
//===------------------------------------------------===//
Constant *Zero = Builder.getInt32(0);

// In order to support reduction variables we need to be able to vectorize
// Phi nodes. Phi nodes have cycles, so we need to vectorize them in two
// stages. First, we create a new vector PHI node with no incoming edges.
// We use this value when we vectorize all of the instructions that use the
// PHI. Next, after all of the instructions in the block are complete we
// add the new incoming edges to the PHI. At this point all of the
// instructions in the basic block are vectorized, so we can use them to
// construct the PHI.
PhiVector RdxPHIsToFix;

// Scan the loop in a topological order to ensure that defs are vectorized
// before users.
LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);

// Vectorize all of the blocks in the original loop.
for (LoopBlocksDFS::RPOIterator bb = DFS.beginRPO(),
be = DFS.endRPO(); bb != be; ++bb)
vectorizeBlockInLoop(*bb, &RdxPHIsToFix);

// At this point every instruction in the original loop is widened to
// a vector form. We are almost done. Now, we need to fix the PHI nodes
// that we vectorized. The PHI nodes are currently empty because we did
// not want to introduce cycles. Notice that the remaining PHI nodes
// that we need to fix are reduction variables.

// Create the 'reduced' values for each of the induction vars.
// The reduced values are the vector values that we scalarize and combine
// after the loop is finished.
for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();
it != e; ++it) {
PHINode RdxPhi = it;
assert(RdxPhi && "Unable to recover vectorized PHI");

// Find the reduction variable descriptor.
assert(Legal->getReductionVars()->count(RdxPhi) &&
"Unable to find the reduction variable");
RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[RdxPhi];

RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();
TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
RdxDesc.getMinMaxRecurrenceKind();
setDebugLocFromInst(Builder, ReductionStartValue);

// We need to generate a reduction vector from the incoming scalar.
// To do so, we need to generate the 'identity' vector and override
// one of the elements with the incoming scalar reduction. We need
// to do it in the vector-loop preheader.
Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());

// This is the vector-clone of the value that leaves the loop.
VectorParts &VectorExit = getVectorValue(LoopExitInst);
Type *VecTy = VectorExit[0]->getType();

// Find the reduction identity variable. Zero for addition, or, xor,
// one for multiplication, -1 for And.
Value *Identity;
Value *VectorStart;
if (RK == RecurrenceDescriptor::RK_IntegerMinMax \|\|
RK == RecurrenceDescriptor::RK_FloatMinMax) {
// MinMax reduction have the start value as their identify.
if (VF == 1) {
VectorStart = Identity = ReductionStartValue;
} else {
VectorStart = Identity =
Builder.CreateVectorSplat(VF, ReductionStartValue, "minmax.ident");
}
} else {
// Handle other reduction kinds:
Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(
RK, VecTy->getScalarType());
if (VF == 1) {
Identity = Iden;
// This vector is the Identity vector where the first element is the
// incoming scalar reduction.
VectorStart = ReductionStartValue;
} else {
Identity = ConstantVector::getSplat(VF, Iden);

// This vector is the Identity vector where the first element is the
// incoming scalar reduction.
VectorStart =
Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);
}
}

// Fix the vector-loop phi.

// Reductions do not have to start at zero. They can start with
// any loop invariant values.
VectorParts &VecRdxPhi = WidenMap.get(RdxPhi);
BasicBlock *Latch = OrigLoop->getLoopLatch();
Value *LoopVal = RdxPhi->getIncomingValueForBlock(Latch);
VectorParts &Val = getVectorValue(LoopVal);
for (unsigned part = 0; part < UF; ++part) {
// Make sure to add the reduction stat value only to the
// first unroll part.
Value *StartVal = (part == 0) ? VectorStart : Identity;
cast<PHINode>(VecRdxPhi[part])->addIncoming(StartVal,
LoopVectorPreHeader);
cast<PHINode>(VecRdxPhi[part])->addIncoming(Val[part],
LoopVectorBody.back());
}

// Before each round, move the insertion point right between
// the PHIs and the values we are going to write.
// This allows us to write both PHINodes and the extractelement
// instructions.
Builder.SetInsertPoint(LoopMiddleBlock->getFirstInsertionPt());

VectorParts RdxParts;		Value hookCloneValue(Value V, ValueToValueMapTy &VM,
setDebugLocFromInst(Builder, LoopExitInst);		IRBuilder<> &IRB) override {
for (unsigned part = 0; part < UF; ++part) {		LV.Builder.SetInsertPoint(IRB.GetInsertPoint());
// This PHINode contains the vectorized reduction variable, or		unsigned UF = LV.UF;
// the initial value vector, if we bypass the vector loop.		LV.UF = 1;
VectorParts &RdxExitVal = getVectorValue(LoopExitInst);		InnerLoopVectorizer::PhiVector Dummy;
PHINode *NewPhi = Builder.CreatePHI(VecTy, 2, "rdx.vec.exit.phi");		LV.vectorizeSingleInstruction(cast<Instruction>(V), &Dummy);
Value *StartVal = (part == 0) ? VectorStart : Identity;
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)		LV.UF = UF;
NewPhi->addIncoming(StartVal, LoopBypassBlocks[I]);		return LV.WidenMap.get(V)[0];
NewPhi->addIncoming(RdxExitVal[part],		}
LoopVectorBody.back());
RdxParts.push_back(NewPhi);		unsigned hookInterleaveInductionStep() override {
}		return LV.VF;
		}
// Reduce all of the unrolled parts into a single vector.
Value *ReducedPartRdx = RdxParts[0];		Value hookReduceReductions(LoopEditor::Reduction R,
unsigned Op = RecurrenceDescriptor::getRecurrenceBinOp(RK);		ArrayRef<Value*> Values,
setDebugLocFromInst(Builder, ReducedPartRdx);		IRBuilder<> &IRB) override {
for (unsigned part = 1; part < UF; ++part) {		LV.Builder.SetInsertPoint(IRB.GetInsertPoint());
if (Op != Instruction::ICmp && Op != Instruction::FCmp)
// Floating point operations had to be 'fast' to enable the reduction.		Value *ReducedPartRdx = nullptr;
ReducedPartRdx = addFastMathFlag(		for (auto *V : Values) {
Builder.CreateBinOp((Instruction::BinaryOps)Op, RdxParts[part],		if (!ReducedPartRdx)
ReducedPartRdx, "bin.rdx"));		ReducedPartRdx = V;
else		else
ReducedPartRdx = RecurrenceDescriptor::createMinMaxOp(		ReducedPartRdx = R->createOp(IRB, ReducedPartRdx, V);
Builder, MinMaxKind, ReducedPartRdx, RdxParts[part]);
}		}

if (VF > 1) {		if (LV.VF == 1)
		return ReducedPartRdx;

// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles		// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
// and vector ops, reducing the set of values being computed by half each		// and vector ops, reducing the set of values being computed by half each
// round.		// round.
assert(isPowerOf2_32(VF) &&		assert(isPowerOf2_32(LV.VF) &&
"Reduction emission only supported for pow2 vectors!");		"Reduction emission only supported for pow2 vectors!");
Value *TmpVec = ReducedPartRdx;		Value *TmpVec = ReducedPartRdx;
SmallVector<Constant*, 32> ShuffleMask(VF, nullptr);		SmallVector<Constant*, 32> ShuffleMask(LV.VF, nullptr);
for (unsigned i = VF; i != 1; i >>= 1) {		for (unsigned i = LV.VF; i != 1; i >>= 1) {
// Move the upper half of the vector to the lower half.		// Move the upper half of the vector to the lower half.
for (unsigned j = 0; j != i/2; ++j)		for (unsigned j = 0; j != i/2; ++j)
ShuffleMask[j] = Builder.getInt32(i/2 + j);		ShuffleMask[j] = LV.Builder.getInt32(i/2 + j);

// Fill the rest of the mask with undef.		// Fill the rest of the mask with undef.
std::fill(&ShuffleMask[i/2], ShuffleMask.end(),		std::fill(&ShuffleMask[i/2], ShuffleMask.end(),
UndefValue::get(Builder.getInt32Ty()));		UndefValue::get(LV.Builder.getInt32Ty()));

Value *Shuf =		Value *Shuf =
Builder.CreateShuffleVector(TmpVec,		IRB.CreateShuffleVector(TmpVec,
UndefValue::get(TmpVec->getType()),		UndefValue::get(TmpVec->getType()),
ConstantVector::get(ShuffleMask),		ConstantVector::get(ShuffleMask),
"rdx.shuf");		"rdx.shuf");

if (Op != Instruction::ICmp && Op != Instruction::FCmp)		TmpVec = R->createOp(IRB, TmpVec, Shuf);
// Floating point operations had to be 'fast' to enable the reduction.
TmpVec = addFastMathFlag(Builder.CreateBinOp(
(Instruction::BinaryOps)Op, TmpVec, Shuf, "bin.rdx"));
else
TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind,
TmpVec, Shuf);
}		}

// The result is in the first element of the vector.		// The result is in the first element of the vector.
ReducedPartRdx = Builder.CreateExtractElement(TmpVec,		return IRB.CreateExtractElement(TmpVec, IRB.getInt32(0));
Builder.getInt32(0));
}		}
		};

// Create a phi node that merges control-flow from the backedge-taken check		void InnerLoopVectorizer::vectorizeLoop() {
// block and the middle block.		//===------------------------------------------------===//
PHINode *BCBlockPhi = PHINode::Create(RdxPhi->getType(), 2, "bc.merge.rdx",		//
LoopScalarPreHeader->getTerminator());		// Notice: any optimization or new instruction that go
BCBlockPhi->addIncoming(ReductionStartValue, LoopBypassBlocks[0]);		// into the code below should be also be implemented in
BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);		// the cost-model.
		//
// Now, we need to fix the users of the reduction variable		//===------------------------------------------------===//
// inside and outside of the scalar remainder loop.
// We know that the loop is in LCSSA form. We need to update the
// PHI nodes in the exit blocks.
for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
if (!LCSSAPhi) break;

// All PHINodes need to have a single entry edge, or two if
// we already fixed them.
assert(LCSSAPhi->getNumIncomingValues() < 3 && "Invalid LCSSA PHI");

// We found our reduction value exit-PHI. Update it with the
// incoming bypass edge.
if (LCSSAPhi->getIncomingValue(0) == LoopExitInst) {
// Add an edge coming from the bypass.
LCSSAPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);
break;
}
}// end of the LCSSA phi scan.

// Fix the scalar loop reduction variable with the incoming reduction sum		BasicBlock *PredBB;
// from the vector body and from the backedge value.		const DataLayout &DL =
int IncomingEdgeBlockIdx =		OrigLoop->getHeader()->getParent()->getParent()->getDataLayout();
(RdxPhi)->getBasicBlockIndex(OrigLoop->getLoopLatch());		LoopEditor ScalarLE(OrigLoop, SE, &DL, LI, DT);
assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");		VectorizerDelegate D(*this);
// Pick the other block.		ScalarLE.versionWidenAndInterleaveLoop(UF, PredBB, &D);
int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
(RdxPhi)->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);
(RdxPhi)->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);
}// end of for each redux variable.

fixLCSSAPHIs();		Instruction MemRuntimeCheck, FirstCheckInst;
		std::tie(FirstCheckInst, MemRuntimeCheck) =
		Legal->getLAI()->addRuntimeCheck(PredBB->getTerminator());
		if (MemRuntimeCheck) {
		AddedSafetyChecks = true;
		// Create a new block containing the memory check.
		PredBB->setName("vector.memcheck");
		}

// Remove redundant induction instructions.
cse(LoopVectorBody);
}		}

void InnerLoopVectorizer::fixLCSSAPHIs() {		void InnerLoopVectorizer::fixLCSSAPHIs() {
for (BasicBlock::iterator LEI = LoopExitBlock->begin(),		for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {		LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);		PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
if (!LCSSAPhi) break;		if (!LCSSAPhi) break;
if (LCSSAPhi->getNumIncomingValues() == 1)		if (LCSSAPhi->getNumIncomingValues() == 1)
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	case LoopVectorizationLegality::IK_PtrInduction:
}		}
return;		return;
}		}
}		}

void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {		void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
		vectorizeSingleInstruction(it, PV);
		}
		}

		void InnerLoopVectorizer::vectorizeSingleInstruction(Instruction it, PhiVector PV) {
VectorParts &Entry = WidenMap.get(it);		VectorParts &Entry = WidenMap.get(it);
switch (it->getOpcode()) {		switch (it->getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		// Nothing to do for PHIs and BR, since we already took care of the
// loop control flow instructions.		// loop control flow instructions.
continue;		return;
case Instruction::PHI: {		case Instruction::PHI: {
// Vectorize PHINodes.		// Vectorize PHINodes.
widenPHIInstruction(it, Entry, UF, VF, PV);		widenPHIInstruction(it, Entry, UF, VF, PV);
continue;		return;
}// End of PHI.		}// End of PHI.

case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeSingleInstruction(Instruction it, PhiVector PV) {
}		}

case Instruction::Call: {		case Instruction::Call: {
// Ignore dbg intrinsics.		// Ignore dbg intrinsics.
if (isa<DbgInfoIntrinsic>(it))		if (isa<DbgInfoIntrinsic>(it))
break;		break;
setDebugLocFromInst(Builder, it);		setDebugLocFromInst(Builder, it);

Module *M = BB->getParent()->getParent();		Module *M = it->getParent()->getParent()->getParent();
CallInst *CI = cast<CallInst>(it);		CallInst *CI = cast<CallInst>(it);

StringRef FnName = CI->getCalledFunction()->getName();		StringRef FnName = CI->getCalledFunction()->getName();
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
Type *RetTy = ToVectorTy(CI->getType(), VF);		Type *RetTy = ToVectorTy(CI->getType(), VF);
SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));		Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	case Instruction::Call: {
break;		break;
}		}

default:		default:
// All other instructions are unsupported. Scalarize them.		// All other instructions are unsupported. Scalarize them.
scalarizeInstruction(it);		scalarizeInstruction(it);
break;		break;
}// end of switch.		}// end of switch.
}// end of for_each instr.
}		}

void InnerLoopVectorizer::updateAnalysis() {		void InnerLoopVectorizer::updateAnalysis() {
// Forget the original basic block.
SE->forgetLoop(OrigLoop);

// Update the dominator tree information.
assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&
"Entry does not dominate exit.");

for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
DT->addNewBlock(LoopBypassBlocks[I], LoopBypassBlocks[I-1]);
DT->addNewBlock(LoopVectorPreHeader, LoopBypassBlocks.back());

// Due to if predication of stores we might create a sequence of "if(pred)
// a[i] = ...; " blocks.
for (unsigned i = 0, e = LoopVectorBody.size(); i != e; ++i) {
if (i == 0)
DT->addNewBlock(LoopVectorBody[0], LoopVectorPreHeader);
else if (isPredicatedBlock(i)) {
DT->addNewBlock(LoopVectorBody[i], LoopVectorBody[i-1]);
} else {
DT->addNewBlock(LoopVectorBody[i], LoopVectorBody[i-2]);
}
}

DT->addNewBlock(LoopMiddleBlock, LoopBypassBlocks[1]);
DT->addNewBlock(LoopScalarPreHeader, LoopBypassBlocks[0]);
DT->changeImmediateDominator(LoopScalarBody, LoopScalarPreHeader);
DT->changeImmediateDominator(LoopExitBlock, LoopBypassBlocks[0]);

DEBUG(DT->verifyDomTree());		DEBUG(DT->verifyDomTree());
}		}

/// \brief Check whether it is safe to if-convert this phi node.		/// \brief Check whether it is safe to if-convert this phi node.
///		///
/// Phi nodes with constant expressions that can trap are not safe to if		/// Phi nodes with constant expressions that can trap are not safe to if
/// convert.		/// convert.
static bool canIfConvertPHINodes(BasicBlock *BB) {		static bool canIfConvertPHINodes(BasicBlock *BB) {
▲ Show 20 Lines • Show All 1,676 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

RFC: LoopEditor, a high-level loop transform toolkitNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 31326

include/llvm/InitializePasses.h

include/llvm/Transforms/Utils/LoopEditor.h

lib/Target/ARM/Thumb2ITBlockPass.cpp

lib/Transforms/Scalar/Scalar.cpp

lib/Transforms/Utils/CMakeLists.txt

lib/Transforms/Utils/LoopEditor.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

RFC: LoopEditor, a high-level loop transform toolkit
Needs ReviewPublic