Index: docs/Proposals/VectorizationPlan.rst =================================================================== --- docs/Proposals/VectorizationPlan.rst +++ docs/Proposals/VectorizationPlan.rst @@ -82,8 +82,14 @@ replicated VF*UF times to handle scalarized and predicated instructions. Innerloops are also modelled as SESE regions. -Low-level Design -================ +7. Support instruction-level analysis and transformation, as part of Planning + Step 2.b: During vectorization instructions may need to be traversed, moved, + replaced by other instructions or be created. For example, vector idiom + detection and formation involves searching for and optimizing instruction + patterns. + +Definitions +=========== The low-level design of VPlan comprises of the following classes. :LoopVectorizationPlanner: @@ -139,11 +145,64 @@ instructions; e.g., cloned once, replicated multiple times or widened according to selected VF. +:VPValue: + The base of VPlan's def-use relations class hierarchy. When instantiated, it + models a constant or a live-in Value in VPlan. It has users, which are of type + VPUser, but no operands. + +:VPUser: + A VPValue representing a general vertex in the def-use graph of VPlan. It has + operands which are of type VPValue. When instantiated, it represents a + live-out Instruction that exists outside VPlan. VPUser is similar in some + aspects to LLVM's User class. + +:VPInstruction: + A VPInstruction is both a VPRecipe and a VPUser. It models a single + VPlan-level instruction to be generated if the VPlan is executed, including + its opcode and possibly additional characteristics. It is the basis for + writing instruction-level analyses and optimizations in VPlan as creating, + replacing or moving VPInstructions record both def-use and scheduling + decisions. VPInstructions also extend LLVM IR's opcodes with idiomatic + operations that enrich the Vectorizer's semantics. + :VPTransformState: Stores information used for generating output IR, passed from LoopVectorizationPlanner to its selected VPlan for execution, and used to pass additional information down to VPBlocks and VPRecipes. +The Planning Process and VPlan Roadmap +====================================== + +Transforming the Loop Vectorizer to use VPlan follows a staged approach. First, +VPlan is used to record the final vectorization decisions, and to execute them: +the Hierarchical CFG models the planned control-flow, and Recipes capture +decisions taken inside basic-blocks. Next, VPlan will be used also as the basis +for taking these decisions, effectively turning them into a series of +VPlan-to-VPlan algorithms. Finally, VPlan will support the planning process +itself including cost-based analyses for making these decisions, to fully +support compositional and iterative decision making. + +Some decisions are local to an instruction in the loop, such as whether to widen +it into a vector instruction or replicate it, keeping the generated instructions +in place. Other decisions, however, involve moving instructions, replacing them +with other instructions, and/or introducing new instructions. For example, a +cast may sink past a later instruction and be widened to handle first-order +recurrence; an interleave group of strided gathers or scatters may effectively +move to one place where they are replaced with shuffles and a common wide vector +load or store; new instructions may be introduced to compute masks, shuffle the +elements of vectors, and pack scalar values into vectors or vice-versa. + +In order for VPlan to support making instruction-level decisions and analyses, +it needs to model the relevant instructions along with their def/use relations. +This too follows a staged approach: first, the new instructions that compute +masks are modeled as VPInstructions, along with their induced def/use subgraph. +This effectively models masks in VPlan, facilitating VPlan-based predication. +Next, the logic embedded within each Recipe for generating its instructions at +VPlan execution time, will instead take part in the planning process by modeling +them as VPInstructions. Finally, only logic that applies to instructions as a +group will remain in Recipes, such as interleave groups and potentially other +idiom groups having synergistic cost. + Related LLVM components ----------------------- 1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP @@ -152,6 +211,9 @@ 2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by Polly [7]_. +3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of + the Loop Vectorizer and extend it to handle outer loops [8,9]_. + References ---------- .. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit @@ -180,3 +242,6 @@ .. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks, European LLVM Developers' Meeting 2017. + +.. [9] "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop + Auto-Vectorization", Intel Vectorizer Team, LLVM Developers' Meeting 2016. Index: lib/Transforms/Vectorize/LoopVectorize.cpp =================================================================== --- lib/Transforms/Vectorize/LoopVectorize.cpp +++ lib/Transforms/Vectorize/LoopVectorize.cpp @@ -48,6 +48,7 @@ #include "llvm/Transforms/Vectorize/LoopVectorize.h" #include "VPlan.h" +#include "VPlanBuilder.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/Hashing.h" #include "llvm/ADT/MapVector.h" @@ -252,10 +253,12 @@ class LoopVectorizationLegality; class LoopVectorizationCostModel; class LoopVectorizationRequirements; +class VPBlendRecipe; class VPInterleaveRecipe; class VPReplicateRecipe; class VPWidenIntOrFpInductionRecipe; class VPWidenRecipe; +class VPWidenMemoryInstructionRecipe; /// Returns true if the given loop body has a cycle, excluding the loop /// itself. @@ -421,11 +424,6 @@ /// new unrolled loop, where UF is the unroll factor. typedef SmallVector VectorParts; - /// A helper function that computes the predicate of the block BB, assuming - /// that the header block of the loop is set to True. It returns the *entry* - /// mask for the block BB. - VectorParts createBlockInMask(BasicBlock *BB); - /// Vectorize a single PHINode in a block. This method handles the induction /// variable canonicalization. It supports both VF = 1 for unrolled loops and /// arbitrary length vectors. @@ -478,6 +476,15 @@ /// Try to vectorize the interleaved access group that \p Instr belongs to. void vectorizeInterleaveGroup(Instruction *Instr); + /// Vectorize Load and Store instructions, optionally masking the vector + /// operations if \p BlockInMask is non-null. + void vectorizeMemoryInstruction(Instruction *Instr, + VectorParts *BlockInMask = nullptr); + + /// \brief Set the debug location in the builder using the debug location in + /// the instruction. + void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr); + protected: /// A small list of PHINodes. typedef SmallVector PhiVector; @@ -488,12 +495,6 @@ /// vectorization factor. typedef SmallVector, 2> ScalarParts; - // When we if-convert we need to create edge masks. We have to cache values - // so that we don't end up with exponential recursion/IR. - typedef DenseMap, VectorParts> - EdgeMaskCacheTy; - typedef DenseMap BlockMaskCacheTy; - /// Set up the values of the IVs correctly when exiting the vector loop. void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II, Value *CountRoundDown, Value *EndValue, @@ -528,17 +529,10 @@ /// represented as. void truncateToMinimalBitwidths(); - /// A helper function that computes the predicate of the edge between SRC - /// and DST. - VectorParts createEdgeMask(BasicBlock *Src, BasicBlock *Dst); - /// Insert the new loop to the loop hierarchy and pass manager /// and update the analysis passes. void updateAnalysis(); - /// Vectorize Load and Store instructions, - virtual void vectorizeMemoryInstruction(Instruction *Instr); - /// Create a broadcast instruction. This method generates a broadcast /// instruction (shuffle) for loop invariant values and for the induction /// value. If this is the induction variable then we extend it to N, N+1, ... @@ -617,10 +611,6 @@ /// vector of instructions. void addMetadata(ArrayRef To, Instruction *From); - /// \brief Set the debug location in the builder using the debug location in - /// the instruction. - void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr); - /// The original loop. Loop *OrigLoop; /// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies @@ -690,8 +680,6 @@ /// Store instructions that were predicated. SmallVector PredicatedInstructions; - EdgeMaskCacheTy EdgeMaskCache; - BlockMaskCacheTy BlockMaskCache; /// Trip count of the original loop. Value *TripCount; /// Trip count of the widened loop (TripCount - TripCount % (VF*UF)) @@ -2137,6 +2125,31 @@ SmallVector VPlans; + /// The current plan; + VPlan *Plan = nullptr; + + /// This class is used to enable the VPlan to invoke a method of ILV. This is + /// needed until the method is refactored out of ILV and becomes reusable. + struct VPCallbackILV : public VPCallback { + InnerLoopVectorizer &ILV; + + VPCallbackILV(InnerLoopVectorizer &ILV) : ILV(ILV) {} + + Value *getOrCreateVectorValues(Value *V, unsigned Part) override { + return ILV.getOrCreateVectorValue(V, Part); + } + }; + + /// A builder used to construct the current plan. + VPBuilder Builder; + + /// When we if-convert we need to create edge masks. We have to cache values + /// so that we don't end up with exponential recursion/IR. Note that + /// if-conversion currently takes place during VPlan-construction, so these + /// caches are only used at that stage. + DenseMap, VPValue *> EdgeMaskCache; + DenseMap BlockMaskCache; + unsigned BestVF; unsigned BestUF; @@ -2198,6 +2211,15 @@ void buildVPlans(unsigned MinVF, unsigned MaxVF); private: + /// A helper function that computes the predicate of the block BB, assuming + /// that the header block of the loop is set to True. It returns the *entry* + /// mask for the block BB. + VPValue *createBlockInMask(BasicBlock *BB); + + /// A helper function that computes the predicate of the edge between SRC + /// and DST. + VPValue *createEdgeMask(BasicBlock *Src, BasicBlock *Dst); + /// Check if \I belongs to an Interleave Group within the given VF \p Range, /// \return true in the first returned value if so and false otherwise. /// Build a new VPInterleaveGroup Recipe if \I is the primary member of an IG @@ -2208,6 +2230,12 @@ /// to \p Range.End. VPInterleaveRecipe *tryToInterleaveMemory(Instruction *I, VFRange &Range); + // Check if \I is a memory instruction to be widened for \p Range.Start and + // potentially masked. Such instructions are handled by a recipe that takes an + // additional VPInstruction for the mask. + VPWidenMemoryInstructionRecipe *tryToWidenMemory(Instruction *I, + VFRange &Range); + /// Check if an induction recipe should be constructed for \I within the given /// VF \p Range. If so build and return it. If not, return null. \p Range.End /// may be decreased to ensure same decision from \p Range.Start to @@ -2215,6 +2243,11 @@ VPWidenIntOrFpInductionRecipe *tryToOptimizeInduction(Instruction *I, VFRange &Range); + /// Handle non-loop phi nodes. Currently all such phi nodes are turned into + /// a sequence of select instructions as the vectorizer currently performs + /// full if-conversion. + VPBlendRecipe *tryToBlend(Instruction *I); + /// Check if \p I can be widened within the given VF \p Range. If \p I can be /// widened for \p Range.Start, check if the last recipe of \p VPBB can be /// extended to include \p I or else build a new VPWidenRecipe for it and @@ -2241,7 +2274,7 @@ /// Build a VPlan according to the information gathered by Legal. \return a /// VPlan for vectorization factors \p Range.Start and up to \p Range.End /// exclusive, possibly decreasing \p Range.End. - VPlan *buildVPlan(VFRange &Range); + VPlan *buildVPlan(VFRange &Range, const SmallPtrSetImpl &NeedDef); }; } // namespace llvm @@ -2989,7 +3022,8 @@ } } -void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) { +void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr, + VectorParts *BlockInMask) { // Attempt to issue a wide load. LoadInst *LI = dyn_cast(Instr); StoreInst *SI = dyn_cast(Instr); @@ -3030,7 +3064,11 @@ if (ConsecutiveStride) Ptr = getOrCreateScalarValue(Ptr, {0, 0}); - VectorParts Mask = createBlockInMask(Instr->getParent()); + VectorParts Mask; + bool isMaskRequired = BlockInMask; + if (isMaskRequired) + Mask = *BlockInMask; + // Handle Stores: if (SI) { assert(!Legal->isUniform(SI->getPointerOperand()) && @@ -3041,7 +3079,7 @@ Instruction *NewSI = nullptr; Value *StoredVal = getOrCreateVectorValue(SI->getValueOperand(), Part); if (CreateGatherScatter) { - Value *MaskPart = Legal->isMaskRequired(SI) ? Mask[Part] : nullptr; + Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr; Value *VectorGep = getOrCreateVectorValue(Ptr, Part); NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment, MaskPart); @@ -3063,14 +3101,14 @@ Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF)); PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF)); - if (Mask[Part]) // The reverse of a null all-one mask is a null mask. + if (isMaskRequired) // Reverse of a null all-one mask is a null mask. Mask[Part] = reverseVector(Mask[Part]); } Value *VecPtr = Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace)); - if (Legal->isMaskRequired(SI) && Mask[Part]) + if (isMaskRequired) NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment, Mask[Part]); else @@ -3087,7 +3125,7 @@ for (unsigned Part = 0; Part < UF; ++Part) { Value *NewLI; if (CreateGatherScatter) { - Value *MaskPart = Legal->isMaskRequired(LI) ? Mask[Part] : nullptr; + Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr; Value *VectorGep = getOrCreateVectorValue(Ptr, Part); NewLI = Builder.CreateMaskedGather(VectorGep, Alignment, MaskPart, nullptr, "wide.masked.gather"); @@ -3102,13 +3140,13 @@ // wide load needs to start at the last vector element. PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF)); PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF)); - if (Mask[Part]) // The reverse of a null all-one mask is a null mask. + if (isMaskRequired) // Reverse of a null all-one mask is a null mask. Mask[Part] = reverseVector(Mask[Part]); } Value *VecPtr = Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace)); - if (Legal->isMaskRequired(LI) && Mask[Part]) + if (isMaskRequired) NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part], UndefValue::get(DataTy), "wide.masked.load"); @@ -4406,79 +4444,11 @@ } while (Changed); } -InnerLoopVectorizer::VectorParts -InnerLoopVectorizer::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) { - assert(is_contained(predecessors(Dst), Src) && "Invalid edge"); - - // Look for cached value. - std::pair Edge(Src, Dst); - EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge); - if (ECEntryIt != EdgeMaskCache.end()) - return ECEntryIt->second; - - VectorParts SrcMask = createBlockInMask(Src); - - // The terminator has to be a branch inst! - BranchInst *BI = dyn_cast(Src->getTerminator()); - assert(BI && "Unexpected terminator found"); - - if (!BI->isConditional()) - return EdgeMaskCache[Edge] = SrcMask; - - VectorParts EdgeMask(UF); - for (unsigned Part = 0; Part < UF; ++Part) { - auto *EdgeMaskPart = getOrCreateVectorValue(BI->getCondition(), Part); - if (BI->getSuccessor(0) != Dst) - EdgeMaskPart = Builder.CreateNot(EdgeMaskPart); - - if (SrcMask[Part]) // Otherwise block in-mask is all-one, no need to AND. - EdgeMaskPart = Builder.CreateAnd(EdgeMaskPart, SrcMask[Part]); - - EdgeMask[Part] = EdgeMaskPart; - } - - return EdgeMaskCache[Edge] = EdgeMask; -} - -InnerLoopVectorizer::VectorParts -InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) { - assert(OrigLoop->contains(BB) && "Block is not a part of a loop"); - - // Look for cached value. - BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB); - if (BCEntryIt != BlockMaskCache.end()) - return BCEntryIt->second; - - // All-one mask is modelled as no-mask following the convention for masked - // load/store/gather/scatter. Initialize BlockMask to no-mask. - VectorParts BlockMask(UF); - for (unsigned Part = 0; Part < UF; ++Part) - BlockMask[Part] = nullptr; - - // Loop incoming mask is all-one. - if (OrigLoop->getHeader() == BB) - return BlockMaskCache[BB] = BlockMask; - - // This is the block mask. We OR all incoming edges. - for (auto *Predecessor : predecessors(BB)) { - VectorParts EdgeMask = createEdgeMask(Predecessor, BB); - if (!EdgeMask[0]) // Mask of predecessor is all-one so mask of block is too. - return BlockMaskCache[BB] = EdgeMask; - - if (!BlockMask[0]) { // BlockMask has its initialized nullptr value. - BlockMask = EdgeMask; - continue; - } - - for (unsigned Part = 0; Part < UF; ++Part) - BlockMask[Part] = Builder.CreateOr(BlockMask[Part], EdgeMask[Part]); - } - - return BlockMaskCache[BB] = BlockMask; -} - void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF) { + assert(PN->getParent() == OrigLoop->getHeader() && + "Non-header phis should have been handled elsewhere"); + PHINode *P = cast(PN); // In order to support recurrences we need to be able to vectorize Phi nodes. // Phi nodes have cycles, so we need to vectorize them in two stages. This is @@ -4497,43 +4467,6 @@ } setDebugLocFromInst(Builder, P); - // Check for PHI nodes that are lowered to vector selects. - if (P->getParent() != OrigLoop->getHeader()) { - // We know that all PHIs in non-header blocks are converted into - // selects, so we don't have to worry about the insertion order and we - // can just use the builder. - // At this point we generate the predication tree. There may be - // duplications since this is a simple recursive scan, but future - // optimizations will clean it up. - - unsigned NumIncoming = P->getNumIncomingValues(); - - // Generate a sequence of selects of the form: - // SELECT(Mask3, In3, - // SELECT(Mask2, In2, - // ( ...))) - VectorParts Entry(UF); - for (unsigned In = 0; In < NumIncoming; In++) { - VectorParts Cond = - createEdgeMask(P->getIncomingBlock(In), P->getParent()); - - for (unsigned Part = 0; Part < UF; ++Part) { - Value *In0 = getOrCreateVectorValue(P->getIncomingValue(In), Part); - assert((Cond[Part] || NumIncoming == 1) && - "Multiple predecessors with one predecessor having a full mask"); - if (In == 0) - Entry[Part] = In0; // Initialize with the first incoming value. - else - // Select between the current value and the previous incoming edge - // based on the incoming mask. - Entry[Part] = Builder.CreateSelect(Cond[Part], In0, Entry[Part], - "predphi"); - } - } - for (unsigned Part = 0; Part < UF; ++Part) - VectorLoopValueMap.setVectorValue(P, Part, Entry[Part]); - return; - } // This PHINode must be an induction variable. // Make sure that we know about it. @@ -4760,8 +4693,7 @@ case Instruction::Store: case Instruction::Load: - vectorizeMemoryInstruction(&I); - break; + llvm_unreachable("Memory instructions are widened elsewhere"); case Instruction::ZExt: case Instruction::SExt: case Instruction::FPToUI: @@ -7543,8 +7475,11 @@ // Perform the actual loop transformation. // 1. Create a new empty loop. Unlink the old loop and connect the new one. - VPTransformState State{ - BestVF, BestUF, LI, DT, ILV.Builder, ILV.VectorLoopValueMap, &ILV}; + VPCallbackILV CallbackILV(ILV); + + VPTransformState State{BestVF, BestUF, LI, + DT, ILV.Builder, ILV.VectorLoopValueMap, + &ILV, CallbackILV}; State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton(); //===------------------------------------------------===// @@ -7557,7 +7492,7 @@ // 2. Copy and widen instructions from the old loop into the new loop. assert(VPlans.size() == 1 && "Not a single VPlan to execute."); - VPlan *Plan = *VPlans.begin(); + Plan = *VPlans.begin(); Plan->execute(&State); // 3. Fix the vectorized code: take care of header phi's, live-outs, @@ -7752,6 +7687,97 @@ } }; +/// A recipe for vectorizing a phi-node as a sequence of mask-based select +/// instructions. +class VPBlendRecipe : public VPRecipeBase { +private: + PHINode *Phi; + + /// The blend operation is a User of a mask, if not null. + VPUser *User = nullptr; + +public: + VPBlendRecipe(PHINode *Phi, ArrayRef Masks) + : VPRecipeBase(VPBlendSC), Phi(Phi) { + assert((Phi->getNumIncomingValues() == 1 || + Phi->getNumIncomingValues() == Masks.size()) && + "Expected the same number of incoming values and masks"); + if (!Masks.empty()) + User = new VPUser(Masks); + } + + ~VPBlendRecipe() { + if (User) + delete User; + } + + /// Method to support type inquiry through isa, cast, and dyn_cast. + static inline bool classof(const VPRecipeBase *V) { + return V->getVPRecipeID() == VPRecipeBase::VPBlendSC; + } + + /// Generate the phi/select nodes. + void execute(VPTransformState &State) override { + State.ILV->setDebugLocFromInst(State.Builder, Phi); + // We know that all PHIs in non-header blocks are converted into + // selects, so we don't have to worry about the insertion order and we + // can just use the builder. + // At this point we generate the predication tree. There may be + // duplications since this is a simple recursive scan, but future + // optimizations will clean it up. + + unsigned NumIncoming = Phi->getNumIncomingValues(); + + assert((User || NumIncoming == 1) && + "Multiple predecessors with predecessors having a full mask"); + // Generate a sequence of selects of the form: + // SELECT(Mask3, In3, + // SELECT(Mask2, In2, + // ( ...))) + InnerLoopVectorizer::VectorParts Entry(State.UF); + for (unsigned In = 0; In < NumIncoming; ++In) { + for (unsigned Part = 0; Part < State.UF; ++Part) { + // We might have single edge PHIs (blocks) - use an identity + // 'select' for the first PHI operand. + Value *In0 = + State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part); + if (In == 0) + Entry[Part] = In0; // Initialize with the first incoming value. + else { + // Select between the current value and the previous incoming edge + // based on the incoming mask. + Value *Cond = State.get(User->getOperand(In), Part); + Entry[Part] = + State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi"); + } + } + } + for (unsigned Part = 0; Part < State.UF; ++Part) + State.ValueMap.setVectorValue(Phi, Part, Entry[Part]); + } + + /// Print the recipe. + void print(raw_ostream &O, const Twine &Indent) const override { + O << " +\n" << Indent << "\"BLEND "; + Phi->printAsOperand(O, false); + O << " ="; + if (!User) { + // Not a User of any mask: not really blending, this is a + // single-predecessor phi. + O << " "; + Phi->getIncomingValue(0)->printAsOperand(O, false); + } else { + for (unsigned I = 0, E = User->getNumOperands(); I < E; ++I) { + O << " "; + Phi->getIncomingValue(I)->printAsOperand(O, false); + O << "/"; + User->getOperand(I)->printAsOperand(O); + } + } + O << "\\l\""; + } +}; + /// VPInterleaveRecipe is a recipe for transforming an interleave group of load /// or stores into one wide load/store and shuffles. class VPInterleaveRecipe : public VPRecipeBase { @@ -7839,13 +7865,18 @@ /// A recipe for generating conditional branches on the bits of a mask. class VPBranchOnMaskRecipe : public VPRecipeBase { private: - /// The input IR basic block used to obtain the mask providing the condition - /// bits for the branch. - BasicBlock *MaskedBasicBlock; + VPUser *User = nullptr; public: - VPBranchOnMaskRecipe(BasicBlock *BB) - : VPRecipeBase(VPBranchOnMaskSC), MaskedBasicBlock(BB) {} + VPBranchOnMaskRecipe(VPValue *BlockInMask) : VPRecipeBase(VPBranchOnMaskSC) { + if (BlockInMask) // nullptr means all-one mask. + User = new VPUser({BlockInMask}); + } + + ~VPBranchOnMaskRecipe() { + if (User) + delete User; + } /// Method to support type inquiry through isa, cast, and dyn_cast. static inline bool classof(const VPRecipeBase *V) { @@ -7858,9 +7889,12 @@ /// Print the recipe. void print(raw_ostream &O, const Twine &Indent) const override { - O << " +\n" - << Indent << "\"BRANCH-ON-MASK-OF " << MaskedBasicBlock->getName() - << "\\l\""; + O << " +\n" << Indent << "\"BRANCH-ON-MASK "; + if (User) + O << *User->getOperand(0); + else + O << " All-One"; + O << "\\l\""; } }; @@ -7896,6 +7930,55 @@ << "\\l\""; } }; + +/// A recipe for generating a binary operation out of two VPInstructions. +/// TODO: We currently execute only per-part unless a specific instance is +/// provided. +class VPWidenMemoryInstructionRecipe : public VPRecipeBase { +private: + Instruction &Instr; + VPUser *User = nullptr; + +public: + VPWidenMemoryInstructionRecipe(Instruction &Instr, VPValue *Mask) + : VPRecipeBase(VPWidenMemoryInstructionSC), Instr(Instr) { + if (Mask) // Create a VPInstruction to register as a user of the mask. + User = new VPUser({Mask}); + } + + ~VPWidenMemoryInstructionRecipe() { + if (User) + delete User; + } + + /// Method to support type inquiry through isa, cast, and dyn_cast. + static inline bool classof(const VPRecipeBase *V) { + return V->getVPRecipeID() == VPRecipeBase::VPWidenMemoryInstructionSC; + } + + /// Generate the wide load/store. + void execute(VPTransformState &State) override { + if (!User) + return State.ILV->vectorizeMemoryInstruction(&Instr); + + // Last (and currently only) operand is a mask. + InnerLoopVectorizer::VectorParts MaskValues(State.UF); + VPValue *Mask = User->getOperand(User->getNumOperands() - 1); + for (unsigned Part = 0; Part < State.UF; ++Part) + MaskValues[Part] = State.get(Mask, Part); + State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues); + } + + /// Print the recipe. + void print(raw_ostream &O, const Twine &Indent) const override { + O << " +\n" << Indent << "\"WIDEN " << VPlanIngredient(&Instr); + if (User) { + O << ", "; + User->getOperand(0)->printAsOperand(O); + } + O << "\\l\""; + } +}; } // end anonymous namespace bool LoopVectorizationPlanner::getDecisionAndClampRange( @@ -7918,14 +8001,89 @@ /// vectorization decision can potentially shorten this sub-range during /// buildVPlan(). void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) { + + // Collect conditions feeding internal conditional branches; they need to be + // represented in VPlan for it to model masking. + SmallPtrSet NeedDef; + + auto *Latch = OrigLoop->getLoopLatch(); + for (BasicBlock *BB : OrigLoop->blocks()) { + if (BB == Latch) + continue; + BranchInst *Branch = dyn_cast(BB->getTerminator()); + if (Branch && Branch->isConditional()) + NeedDef.insert(Branch->getCondition()); + } + for (unsigned VF = MinVF; VF < MaxVF + 1;) { VFRange SubRange = {VF, MaxVF + 1}; - VPlan *Plan = buildVPlan(SubRange); + VPlan *Plan = buildVPlan(SubRange, NeedDef); VPlans.push_back(Plan); VF = SubRange.End; } } +VPValue *LoopVectorizationPlanner::createEdgeMask(BasicBlock *Src, + BasicBlock *Dst) { + assert(is_contained(predecessors(Dst), Src) && "Invalid edge"); + + // Look for cached value. + std::pair Edge(Src, Dst); + auto ECEntryIt = EdgeMaskCache.find(Edge); + if (ECEntryIt != EdgeMaskCache.end()) + return ECEntryIt->second; + + VPValue *SrcMask = createBlockInMask(Src); + + // The terminator has to be a branch inst! + BranchInst *BI = dyn_cast(Src->getTerminator()); + assert(BI && "Unexpected terminator found"); + + if (!BI->isConditional()) + return EdgeMaskCache[Edge] = SrcMask; + + VPValue *EdgeMask = Plan->getVPValue(*BI->getCondition()); + assert(EdgeMask && "No Edge Mask found for condition"); + + if (BI->getSuccessor(0) != Dst) + EdgeMask = Builder.createNot(EdgeMask); + + if (SrcMask) // Otherwise block in-mask is all-one, no need to AND. + EdgeMask = Builder.createAnd(EdgeMask, SrcMask); + + return EdgeMaskCache[Edge] = EdgeMask; +} + +VPValue *LoopVectorizationPlanner::createBlockInMask(BasicBlock *BB) { + // Look for cached value. + auto BCEntryIt = BlockMaskCache.find(BB); + if (BCEntryIt != BlockMaskCache.end()) + return BCEntryIt->second; + + // All-one mask is modelled as no-mask following the convention for masked + // load/store/gather/scatter. Initialize BlockMask to no-mask. + VPValue *BlockMask = nullptr; + + // Loop incoming mask is all-one. + if (OrigLoop->getHeader() == BB) + return BlockMaskCache[BB] = BlockMask; + + // This is the block mask. We OR all incoming edges. + for (auto *Predecessor : predecessors(BB)) { + VPValue *EdgeMask = createEdgeMask(Predecessor, BB); + if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too. + return BlockMaskCache[BB] = EdgeMask; + + if (!BlockMask) { // BlockMask has its initialized nullptr value. + BlockMask = EdgeMask; + continue; + } + + BlockMask = Builder.createOr(BlockMask, EdgeMask); + } + return BlockMaskCache[BB] = BlockMask; +} + VPInterleaveRecipe * LoopVectorizationPlanner::tryToInterleaveMemory(Instruction *I, VFRange &Range) { @@ -7953,6 +8111,32 @@ return new VPInterleaveRecipe(IG); } +VPWidenMemoryInstructionRecipe * +LoopVectorizationPlanner::tryToWidenMemory(Instruction *I, VFRange &Range) { + if (!isa(I) && !isa(I)) + return nullptr; + + auto willWiden = [&](unsigned VF) -> bool { + if (VF == 1) + return false; + LoopVectorizationCostModel::InstWidening Decision = + CM.getWideningDecision(I, VF); + assert(Decision != LoopVectorizationCostModel::CM_Unknown && + "CM decision should be taken at this point."); + assert(Decision != LoopVectorizationCostModel::CM_Interleave && + "Interleave memory opportunity should be caught earlier."); + return Decision != LoopVectorizationCostModel::CM_Scalarize; + }; + + if (!getDecisionAndClampRange(willWiden, Range)) + return nullptr; + + VPValue *Mask = + Legal->isMaskRequired(I) ? createBlockInMask(I->getParent()) : nullptr; + + return new VPWidenMemoryInstructionRecipe(*I, Mask); +} + VPWidenIntOrFpInductionRecipe * LoopVectorizationPlanner::tryToOptimizeInduction(Instruction *I, VFRange &Range) { @@ -7987,6 +8171,29 @@ return nullptr; } +VPBlendRecipe *LoopVectorizationPlanner::tryToBlend(Instruction *I) { + PHINode *Phi = dyn_cast(I); + if (!Phi || Phi->getParent() == OrigLoop->getHeader()) + return nullptr; + // We know that all PHIs in non-header blocks are converted into selects, so + // we don't have to worry about the insertion order and we can just use the + // builder. At this point we generate the predication tree. There may be + // duplications since this is a simple recursive scan, but future + // optimizations will clean it up. + + SmallVector Masks; + unsigned NumIncoming = Phi->getNumIncomingValues(); + for (unsigned In = 0; In < NumIncoming; In++) { + VPValue *EdgeMask = + createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent()); + assert((EdgeMask || NumIncoming == 1) && + "Multiple predecessors with one having a full mask"); + if (EdgeMask) + Masks.push_back(EdgeMask); + } + return new VPBlendRecipe(Phi, Masks); +} + bool LoopVectorizationPlanner::tryToWiden(Instruction *I, VPBasicBlock *VPBB, VFRange &Range) { @@ -8066,13 +8273,10 @@ return UseVectorIntrinsic || !NeedToScalarize; } if (isa(I) || isa(I)) { - LoopVectorizationCostModel::InstWidening Decision = - CM.getWideningDecision(I, VF); - assert(Decision != LoopVectorizationCostModel::CM_Unknown && - "CM decision should be taken at this point."); - assert(Decision != LoopVectorizationCostModel::CM_Interleave && - "Interleave memory opportunity should be caught earlier."); - return Decision != LoopVectorizationCostModel::CM_Scalarize; + assert(CM.getWideningDecision(I, VF) == + LoopVectorizationCostModel::CM_Scalarize && + "Memory widening decisions should have been taken care by now"); + return false; } return true; }; @@ -8132,10 +8336,13 @@ // Instructions marked for predication are replicated and placed under an // if-then construct to prevent side-effects. + // Generate recipes to compute the block mask for this region. + VPValue *BlockInMask = createBlockInMask(Instr->getParent()); + // Build the triangular if-then region. std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str(); assert(Instr->getParent() && "Predicated instruction not in any basic block"); - auto *BOMRecipe = new VPBranchOnMaskRecipe(Instr->getParent()); + auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask); auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe); auto *PHIRecipe = Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr); @@ -8151,8 +8358,11 @@ return Region; } -VPlan *LoopVectorizationPlanner::buildVPlan(VFRange &Range) { - +VPlan * +LoopVectorizationPlanner::buildVPlan(VFRange &Range, + const SmallPtrSetImpl &NeedDef) { + EdgeMaskCache.clear(); + BlockMaskCache.clear(); DenseMap &SinkAfter = Legal->getSinkAfter(); DenseMap SinkAfterInverse; @@ -8172,7 +8382,11 @@ // Create a dummy pre-entry VPBasicBlock to start building the VPlan. VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry"); - VPlan *Plan = new VPlan(VPBB); + Plan = new VPlan(VPBB); + + // Represent values that will have defs inside VPlan. + for (Value *V : NeedDef) + Plan->addVPValue(*V); // Scan the body of the loop in a topological order to visit each basic block // after having visited its predecessor basic blocks. @@ -8186,6 +8400,7 @@ auto *FirstVPBBForBB = new VPBasicBlock(BB->getName()); VPBB->setOneSuccessor(FirstVPBBForBB); VPBB = FirstVPBBForBB; + Builder.setInsertPoint(VPBB); std::vector Ingredients; @@ -8243,11 +8458,21 @@ continue; } + // Check if Instr is a memory operation that should be widened. + if ((Recipe = tryToWidenMemory(Instr, Range))) { + VPBB->appendRecipe(Recipe); + continue; + } + // Check if Instr should form some PHI recipe. if ((Recipe = tryToOptimizeInduction(Instr, Range))) { VPBB->appendRecipe(Recipe); continue; } + if ((Recipe = tryToBlend(Instr))) { + VPBB->appendRecipe(Recipe); + continue; + } if (PHINode *Phi = dyn_cast(Instr)) { VPBB->appendRecipe(new VPWidenPHIRecipe(Phi)); continue; @@ -8339,14 +8564,16 @@ unsigned Part = State.Instance->Part; unsigned Lane = State.Instance->Lane; - auto Cond = State.ILV->createBlockInMask(MaskedBasicBlock); - - Value *ConditionBit = Cond[Part]; - if (!ConditionBit) // Block in mask is all-one. + Value *ConditionBit = nullptr; + if (!User) // Block in mask is all-one. ConditionBit = State.Builder.getTrue(); - else if (ConditionBit->getType()->isVectorTy()) - ConditionBit = State.Builder.CreateExtractElement( - ConditionBit, State.Builder.getInt32(Lane)); + else { + VPValue *BlockInMask = User->getOperand(0); + ConditionBit = State.get(BlockInMask, Part); + if (ConditionBit->getType()->isVectorTy()) + ConditionBit = State.Builder.CreateExtractElement( + ConditionBit, State.Builder.getInt32(Lane)); + } // Replace the temporary unreachable terminator with a new conditional branch, // whose two destinations will be set later when they are created. @@ -8356,9 +8583,6 @@ auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit); CondBr->setSuccessor(0, nullptr); ReplaceInstWithInst(CurrentTerminator, CondBr); - - DEBUG(dbgs() << "\nLV: vectorizing BranchOnMask recipe " - << MaskedBasicBlock->getName()); } void VPPredInstPHIRecipe::execute(VPTransformState &State) { Index: lib/Transforms/Vectorize/VPlan.h =================================================================== --- lib/Transforms/Vectorize/VPlan.h +++ lib/Transforms/Vectorize/VPlan.h @@ -15,8 +15,10 @@ /// treated as proper graphs for generic algorithms; /// 3. Pure virtual VPRecipeBase serving as the base class for recipes contained /// within VPBasicBlocks; -/// 4. The VPlan class holding a candidate for vectorization; -/// 5. The VPlanPrinter class providing a way to print a plan in dot format. +/// 4. VPInstruction, a concrete Recipe and VPUser modeling a single planned +/// instruction; +/// 5. The VPlan class holding a candidate for vectorization; +/// 6. The VPlanPrinter class providing a way to print a plan in dot format; /// These are documented in docs/VectorizationPlan.rst. /// //===----------------------------------------------------------------------===// @@ -24,6 +26,7 @@ #ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H #define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H +#include "VPlanValue.h" #include "llvm/ADT/GraphTraits.h" #include "llvm/ADT/SmallSet.h" #include "llvm/ADT/ilist.h" @@ -35,7 +38,7 @@ // refactoring. namespace { // Forward declarations. -//class InnerLoopVectorizer; +// class InnerLoopVectorizer; class LoopVectorizationLegality; class LoopVectorizationCostModel; } // namespace @@ -77,6 +80,8 @@ /// getScalarValue functions, which assert that the desired value exists. struct VectorizerValueMap { + friend struct VPTransformState; + private: /// The unroll factor. Each entry in the vector map contains UF vector values. unsigned UF; @@ -190,15 +195,23 @@ } }; +/// This class is used to enable the VPlan to invoke a method of ILV. This is +/// needed until the method is refactored out of ILV and becomes reusable. +struct VPCallback { + virtual ~VPCallback() {} + virtual Value *getOrCreateVectorValues(Value *V, unsigned Part) = 0; +}; + /// VPTransformState holds information passed down when "executing" a VPlan, /// needed for generating the output IR. struct VPTransformState { VPTransformState(unsigned VF, unsigned UF, class LoopInfo *LI, class DominatorTree *DT, IRBuilder<> &Builder, - VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV) + VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV, + VPCallback &Callback) : VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder), - ValueMap(ValueMap), ILV(ILV) {} + ValueMap(ValueMap), ILV(ILV), Callback(Callback) {} /// The chosen Vectorization and Unroll Factors of the loop being vectorized. unsigned VF; @@ -209,6 +222,34 @@ /// instructions. Optional Instance; + struct DataState { + /// A type for vectorized values in the new loop. Each value from the + /// original loop, when vectorized, is represented by UF vector values in + /// the new unrolled loop, where UF is the unroll factor. + typedef SmallVector PerPartValuesTy; + + DenseMap PerPartOutput; + } Data; + + /// Get the generated Value for a given VPValue and a given Part. If such a + /// Value does not exist by the time this method is called, then this Def + /// represents Values that are still generated by ILV via ValueMap. + Value *get(VPValue *Def, unsigned Part) { + if (Data.PerPartOutput.count(Def)) + return Data.PerPartOutput[Def][Part]; + // Bring the Values from ValueMap. + return Callback.getOrCreateVectorValues(VPValue2Value[Def], Part); + } + + /// Set the generated Value for a given VPValue and a given Part. + void set(VPValue *Def, Value *V, unsigned Part) { + if (!Data.PerPartOutput.count(Def)) { + DataState::PerPartValuesTy Entry(UF); + Data.PerPartOutput[Def] = Entry; + } + Data.PerPartOutput[Def][Part] = V; + } + /// Hold state information used when constructing the CFG of the output IR, /// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks. struct CFGState { @@ -240,8 +281,14 @@ /// Values of the output IR. VectorizerValueMap &ValueMap; + /// Hold a reference to a mapping between VPValues in VPlan and original + /// Values they correspond to. + VPValue2ValueTy VPValue2Value; + /// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods. class InnerLoopVectorizer *ILV; + + VPCallback &Callback; }; /// VPBlockBase is the building block of the Hierarchical Control-Flow Graph. @@ -445,11 +492,14 @@ /// SubclassID field of the VPRecipeBase objects. They are used for concrete /// type identification. typedef enum { + VPBlendSC, VPBranchOnMaskSC, + VPInstructionSC, VPInterleaveSC, VPPredInstPHISC, VPReplicateSC, VPWidenIntOrFpInductionSC, + VPWidenMemoryInstructionSC, VPWidenPHISC, VPWidenSC, } VPRecipeTy; @@ -475,6 +525,52 @@ virtual void print(raw_ostream &O, const Twine &Indent) const = 0; }; +/// This is a concrete Recipe that models a single VPlan-level instruction. +/// While as any Recipe it may generate a sequence of IR instructions when +/// executed, these instructions would always form a single-def expression as +/// the VPInstruction is also a single def-use vertex. +class VPInstruction : public VPUser, public VPRecipeBase { +public: + /// VPlan opcodes, extending LLVM IR with idiomatics instructions. + enum { Not = Instruction::OtherOpsEnd + 1 }; + +private: + typedef unsigned char OpcodeTy; + OpcodeTy Opcode; + + /// Utility method serving execute(): generates a single instance of the + /// modeled instruction. + void generateInstruction(VPTransformState &State, unsigned Part); + +public: + VPInstruction(unsigned Opcode, std::initializer_list Operands) + : VPUser(VPValue::VPInstructionSC, Operands), + VPRecipeBase(VPRecipeBase::VPInstructionSC), Opcode(Opcode) {} + + /// Method to support type inquiry through isa, cast, and dyn_cast. + static inline bool classof(const VPValue *V) { + return V->getVPValueID() == VPValue::VPInstructionSC; + } + + /// Method to support type inquiry through isa, cast, and dyn_cast. + static inline bool classof(const VPRecipeBase *R) { + return R->getVPRecipeID() == VPRecipeBase::VPInstructionSC; + } + + unsigned getOpcode() const { return Opcode; } + + /// Generate the instruction. + /// TODO: We currently execute only per-part unless a specific instance is + /// provided. + void execute(VPTransformState &State) override; + + /// Print the Recipe. + void print(raw_ostream &O, const Twine &Indent) const override; + + /// Print the VPInstruction. + void print(raw_ostream &O) const; +}; + /// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It /// holds a sequence of zero or more VPRecipe's each representing a sequence of /// output IR instructions. @@ -531,15 +627,17 @@ return V->getVPBlockID() == VPBlockBase::VPBasicBlockSC; } - /// Augment the existing recipes of a VPBasicBlock with an additional - /// \p Recipe as the last recipe. - void appendRecipe(VPRecipeBase *Recipe) { + void insert(VPRecipeBase *Recipe, iterator InsertPt) { assert(Recipe && "No recipe to append."); assert(!Recipe->Parent && "Recipe already in VPlan"); Recipe->Parent = this; - return Recipes.push_back(Recipe); + Recipes.insert(InsertPt, Recipe); } + /// Augment the existing recipes of a VPBasicBlock with an additional + /// \p Recipe as the last recipe. + void appendRecipe(VPRecipeBase *Recipe) { insert(Recipe, end()); } + /// The method which generates the output IR instructions that correspond to /// this VPBasicBlock, thereby "executing" the VPlan. void execute(struct VPTransformState *State) override; @@ -612,6 +710,8 @@ /// Hierarchical-CFG of VPBasicBlocks and VPRegionBlocks rooted at an Entry /// VPBlock. class VPlan { + friend class VPlanPrinter; + private: /// Hold the single entry to the Hierarchical CFG of the VPlan. VPBlockBase *Entry; @@ -622,12 +722,18 @@ /// Holds the name of the VPlan, for printing. std::string Name; + /// Holds a mapping between Values and their corresponding VPValue inside + /// VPlan. + Value2VPValueTy Value2VPValue; + public: VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {} ~VPlan() { if (Entry) VPBlockBase::deleteCFG(Entry); + for (auto &MapEntry : Value2VPValue) + delete MapEntry.second; } /// Generate the IR code for this VPlan. @@ -646,6 +752,13 @@ void setName(const Twine &newName) { Name = newName.str(); } + void addVPValue(Value &V) { + if (!Value2VPValue.count(&V)) + Value2VPValue[&V] = new VPValue(); + } + + VPValue *getVPValue(Value &V) { return Value2VPValue[&V]; } + private: /// Add to the given dominator tree the header block and every new basic block /// that was created between it and the latch block, inclusive. Index: lib/Transforms/Vectorize/VPlan.cpp =================================================================== --- lib/Transforms/Vectorize/VPlan.cpp +++ lib/Transforms/Vectorize/VPlan.cpp @@ -29,6 +29,14 @@ #define DEBUG_TYPE "vplan" +raw_ostream &llvm::operator<<(raw_ostream &OS, const VPValue &V) { + if (const VPInstruction *Instr = dyn_cast(&V)) + Instr->print(OS); + else + V.printAsOperand(OS); + return OS; +} + /// \return the VPBasicBlock that is the entry of Block, possibly indirectly. const VPBasicBlock *VPBlockBase::getEntryBasicBlock() const { const VPBlockBase *Block = this; @@ -196,10 +204,83 @@ State->Instance.reset(); } +void VPInstruction::generateInstruction(VPTransformState &State, + unsigned Part) { + IRBuilder<> &Builder = State.Builder; + switch (getOpcode()) { + case Instruction::UDiv: + case Instruction::SDiv: + case Instruction::SRem: + case Instruction::URem: + case Instruction::Add: + case Instruction::FAdd: + case Instruction::Sub: + case Instruction::FSub: + case Instruction::Mul: + case Instruction::FMul: + case Instruction::FDiv: + case Instruction::FRem: + case Instruction::Shl: + case Instruction::LShr: + case Instruction::AShr: + case Instruction::And: + case Instruction::Or: + case Instruction::Xor: { + Value *A = State.get(getOperand(0), Part); + Value *B = State.get(getOperand(1), Part); + Value *V = Builder.CreateBinOp((Instruction::BinaryOps)getOpcode(), A, B); + State.set(this, V, Part); + break; + } + case VPInstruction::Not: { + Value *A = State.get(getOperand(0), Part); + Value *V = Builder.CreateNot(A); + State.set(this, V, Part); + break; + } + default: + llvm_unreachable("Unsupported opcode for instruction"); + } +} + +void VPInstruction::execute(VPTransformState &State) { + assert(!State.Instance && "VPInstruction executing an Instance"); + for (unsigned Part = 0; Part < State.UF; ++Part) + generateInstruction(State, Part); +} + +void VPInstruction::print(raw_ostream &O, const Twine &Indent) const { + O << " +\n" << Indent << "\"EMIT "; + print(O); + O << "\\l\""; +} + +void VPInstruction::print(raw_ostream &O) const { + printAsOperand(O); + O << " = "; + + switch (getOpcode()) { + case VPInstruction::Not: + O << "not"; + break; + default: + O << Instruction::getOpcodeName(getOpcode()); + } + + for (const VPValue *Operand : operands()) { + O << " "; + Operand->printAsOperand(O); + } +} + /// Generate the code inside the body of the vectorized loop. Assumes a single /// LoopVectorBody basic-block was created for this. Introduce additional /// basic-blocks as needed, and fill them all. void VPlan::execute(VPTransformState *State) { + // 0. Set the reverse mapping from VPValues to Values for code generation. + for (auto &Entry : Value2VPValue) + State->VPValue2Value[Entry.second] = Entry.first; + BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB; BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor(); assert(VectorHeaderBB && "Loop preheader does not have a single successor."); @@ -300,6 +381,14 @@ OS << "graph [labelloc=t, fontsize=30; label=\"Vectorization Plan"; if (!Plan.getName().empty()) OS << "\\n" << DOT::EscapeString(Plan.getName()); + if (!Plan.Value2VPValue.empty()) { + OS << ", where:"; + for (auto Entry : Plan.Value2VPValue) { + OS << "\\n" << *Entry.second; + OS << DOT::EscapeString(" := "); + Entry.first->printAsOperand(OS, false); + } + } OS << "\"]\n"; OS << "node [shape=rect, fontname=Courier, fontsize=30]\n"; OS << "edge [fontname=Courier, fontsize=30]\n"; Index: lib/Transforms/Vectorize/VPlanBuilder.h =================================================================== --- lib/Transforms/Vectorize/VPlanBuilder.h +++ lib/Transforms/Vectorize/VPlanBuilder.h @@ -0,0 +1,61 @@ +//===- VPlanBuilder.h - A VPlan utility for constructing VPInstructions ---===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +/// +/// \file +/// This file provides a VPlan-based builder utility analogous to IRBuilder. +/// It provides an instruction-level API for generating VPInstructions while +/// abstracting away the Recipe manipulation details. +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H +#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H + +#include "VPlan.h" + +namespace llvm { + +class VPBuilder { +private: + VPBasicBlock *BB = nullptr; + VPBasicBlock::iterator InsertPt = VPBasicBlock::iterator(); + + VPInstruction *createInstruction(unsigned Opcode, + std::initializer_list Operands) { + VPInstruction *Instr = new VPInstruction(Opcode, Operands); + BB->insert(Instr, InsertPt); + return Instr; + } + +public: + VPBuilder() {} + + /// \brief This specifies that created VPInstructions should be appended to + /// the end of the specified block. + void setInsertPoint(VPBasicBlock *TheBB) { + assert(TheBB && "Attempting to set a null insert point"); + BB = TheBB; + InsertPt = BB->end(); + } + + VPValue *createNot(VPValue *Operand) { + return createInstruction(VPInstruction::Not, {Operand}); + } + + VPValue *createAnd(VPValue *LHS, VPValue *RHS) { + return createInstruction(Instruction::BinaryOps::And, {LHS, RHS}); + } + + VPValue *createOr(VPValue *LHS, VPValue *RHS) { + return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS}); + } +}; + +} // namespace llvm + +#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H Index: lib/Transforms/Vectorize/VPlanValue.h =================================================================== --- lib/Transforms/Vectorize/VPlanValue.h +++ lib/Transforms/Vectorize/VPlanValue.h @@ -0,0 +1,146 @@ +//===- VPlanValue.h - Represent Values in Vectorizer Plan -----------------===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +/// +/// \file +/// This file contains the declarations of the entities induced by Vectorization +/// Plans, e.g. the instructions the VPlan intends to generate if executed. +/// VPlan models the following entities: +/// VPValue +/// |-- VPUser +/// | |-- VPInstruction +/// These are documented in docs/VectorizationPlan.rst. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H +#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/SmallVector.h" +#include "llvm/IR/Value.h" +#include "llvm/Support/Debug.h" +#include "llvm/Support/raw_ostream.h" + +namespace llvm { + +// Forward declarations. +class VPUser; + +// This is the base class of the VPlan Def/Use graph, used for modeling the data +// flow into, within and out of the VPlan. VPValues can stand for live-ins +// coming from the input IR, instructions which VPlan will generate if executed +// and live-outs which the VPlan will need to fix accordingly. +class VPValue { +private: + const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast). + + SmallVector Users; + +protected: + VPValue(const unsigned char SC) : SubclassID(SC) {} + +public: + /// An enumeration for keeping track of the concrete subclass of VPValue that + /// are actually instantiated. Values of this enumeration are kept in the + /// SubclassID field of the VPValue objects. They are used for concrete + /// type identification. + enum { VPValueSC, VPUserSC, VPInstructionSC }; + + VPValue() : SubclassID(VPValueSC) {} + VPValue(const VPValue &) = delete; + VPValue &operator=(const VPValue &) = delete; + + /// \return an ID for the concrete type of this object. + /// This is used to implement the classof checks. This should not be used + /// for any other purpose, as the values may change as LLVM evolves. + unsigned getVPValueID() const { return SubclassID; } + + void printAsOperand(raw_ostream &OS) const { + OS << "%vp" << (unsigned short)(unsigned long long)this; + } + + unsigned getNumUsers() const { return Users.size(); } + void addUser(VPUser &User) { Users.push_back(&User); } + + typedef SmallVectorImpl::iterator user_iterator; + typedef SmallVectorImpl::const_iterator const_user_iterator; + typedef iterator_range user_range; + typedef iterator_range const_user_range; + + user_iterator user_begin() { return Users.begin(); } + const_user_iterator user_begin() const { return Users.begin(); } + user_iterator user_end() { return Users.end(); } + const_user_iterator user_end() const { return Users.end(); } + user_range users() { return user_range(user_begin(), user_end()); } + const_user_range users() const { + return const_user_range(user_begin(), user_end()); + } +}; + +typedef DenseMap Value2VPValueTy; +typedef DenseMap VPValue2ValueTy; + +raw_ostream &operator<<(raw_ostream &OS, const VPValue &V); + +/// This class augments VPValue with operands which provide the inverse def-use +/// edges from VPValue's users to their defs. +class VPUser : public VPValue { +private: + SmallVector Operands; + + void addOperand(VPValue *Operand) { + Operands.push_back(Operand); + Operand->addUser(*this); + } + +protected: + VPUser(const unsigned char SC) : VPValue(SC) {} + VPUser(const unsigned char SC, ArrayRef Operands) : VPValue(SC) { + for (VPValue *Operand : Operands) + addOperand(Operand); + } + +public: + VPUser() : VPValue(VPValue::VPUserSC) {} + VPUser(ArrayRef Operands) : VPUser(VPValue::VPUserSC, Operands) {} + VPUser(std::initializer_list Operands) + : VPUser(ArrayRef(Operands)) {} + VPUser(const VPUser &) = delete; + VPUser &operator=(const VPUser &) = delete; + + /// Method to support type inquiry through isa, cast, and dyn_cast. + static inline bool classof(const VPValue *V) { + return V->getVPValueID() >= VPUserSC && + V->getVPValueID() <= VPInstructionSC; + } + + unsigned getNumOperands() const { return Operands.size(); } + inline VPValue *getOperand(unsigned N) const { + assert(N < Operands.size() && "Operand index out of bounds"); + return Operands[N]; + } + + typedef SmallVectorImpl::iterator operand_iterator; + typedef SmallVectorImpl::const_iterator const_operand_iterator; + typedef iterator_range operand_range; + typedef iterator_range const_operand_range; + + operand_iterator op_begin() { return Operands.begin(); } + const_operand_iterator op_begin() const { return Operands.begin(); } + operand_iterator op_end() { return Operands.end(); } + const_operand_iterator op_end() const { return Operands.end(); } + operand_range operands() { return operand_range(op_begin(), op_end()); } + const_operand_range operands() const { + return const_operand_range(op_begin(), op_end()); + } +}; + +} // namespace llvm + +#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H Index: test/Transforms/LoopVectorize/if-conversion-nest.ll =================================================================== --- test/Transforms/LoopVectorize/if-conversion-nest.ll +++ test/Transforms/LoopVectorize/if-conversion-nest.ll @@ -42,9 +42,9 @@ ; CHECK-NEXT: [[TMP13:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD6]], ; CHECK-NEXT: [[TMP14:%.*]] = select <4 x i1> [[TMP13]], <4 x i32> , <4 x i32> ; CHECK-NEXT: [[TMP15:%.*]] = and <4 x i1> [[TMP12]], [[TMP11]] -; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> , <4 x i32> ; CHECK-NEXT: [[TMP16:%.*]] = xor <4 x i1> [[TMP12]], ; CHECK-NEXT: [[TMP17:%.*]] = and <4 x i1> [[TMP11]], [[TMP16]] +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> , <4 x i32> ; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[TMP14]], <4 x i32> [[PREDPHI]] ; CHECK-NEXT: [[TMP18:%.*]] = bitcast i32* [[TMP7]] to <4 x i32>* ; CHECK-NEXT: store <4 x i32> [[PREDPHI7]], <4 x i32>* [[TMP18]], align 4, !alias.scope !0, !noalias !3