This is a proposal about implementing vectorization of conditional statements using BOSCC(branches-on-superword-condition-codes).
Current LLVM's Loop vectorization deploy flattning of control flow where the guarded code is executed in all the paths with the help of predicate mask instructions. In many cases the execution of the instructions in all control paths is not optimal.
BOSCC inserts branches that skip a region if the predicate of the region entry evaluates to false.
Consider below loop:
for (unsigned i = 0; i < len; i++) { if (X[i]) { A[i] = B[i] + C[i]; } else { D[i] = E[i] * F[i]; } }
Existing LLVM Flatten Style Vectorization:
for (unsigned i = 0; i < len; i+=4) { VectorMask = (X[i,i+1,i+2,i+3] != <0,0,0,0>); FlipVectorMask = XOR VectorMask <true, true, true, true> Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask Mask.Vector.Store.D[i,i+1,i+2,i+3] = Mask.Vector.Load.E[i,i+1,i+2,i+3] * Mask.Vector.Load.F[i,i+1,i+2,i+3]; // Based on FlipVectorMask }
BOSCC Style Vectorization:
for (unsigned i = 0; i < len; i+=4) { VectorMask = (X[i,i+1,i+2,i+3] != <0,0,0,0>); VectorMaskScalar = VectorToScalarCast VectorMask; if (VectorMaskScalar) { Mask.Vector.Store.A[i,i+1,i+2,i+3] = Mask.Vector.Load.B[i,i+1,i+2,i+3] + Mask.Vector.Load.C[i,i+1,i+2,i+3]; // Based on VectorMask } FlipVectorMask = XOR VectorMask <true, true, true, true> FlipVectorMaskScalar = VectorToScalarCast FlipVectorMask; if (FlipVectorMaskScalar) { Mask.Vector.Store.D[i,i+1,i+2,i+3] = Mask.Vector.Load.E[i,i+1,i+2,i+3] * Mask.Vector.Load.F[i,i+1,i+2,i+3]; // Based on FlipVectorMask } }
Under this change we introduce the following:
1: BOSCCBlockPlanner : Facilitates to generate the required block layout for BOSCC blocks during VPlan.
2: New Recipes:
a: VPBranchOnBOSCCGuardSC : This recipe is responsible for generating the required conditional entry check on a vector block. b: VPBOSCCLiveOutRecipe : This recipe is responsible to generate PHI for the live out from the guarded vector blocks.