This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
5/6
LoopVectorize.cpp

Differential D97861

[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector
ClosedPublic

Authored by david-arm on Mar 3 2021, 7:54 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
frasercrmck
ctetreau
bsmith

Commits

rGd70251163f71: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector

Summary

In places where we create a ConstantVector whose elements are a
linear sequence of the form <start, start + 1, start + 2, ...>
I've changed the code to make use of CreateStepVector, which creates
a vector with the sequence <0, 1, 2, ...>, and a vector addition
operation. This patch is a non-functional change, since the output
from the vectoriser remains unchanged for fixed length vectors and
there are existing asserts that still fire when attempting to use
scalable vectors for vectorising induction variables.

In a later patch we will enable support for scalable vectors
in InnerLoopVectorizer::getStepVector(), which relies upon the new
stepvector intrinsic in IRBuilder::CreateStepVector.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Mar 3 2021, 7:54 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 3 2021, 7:54 AM

david-arm requested review of this revision.Mar 3 2021, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2021, 7:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a parent revision: D97299: [IR][SVE] Add new llvm.experimental.stepvector intrinsic.Mar 3 2021, 7:55 AM

ctetreau added inline comments.Mar 3 2021, 9:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2455–2456	NIT: Since the goal is to eventually support scalable vectors, this cast to FixedVectorType is counterproductive.

Harbormaster completed remote builds in B91822: Diff 327800.Mar 3 2021, 2:35 PM

david-arm updated this revision to Diff 328165.Mar 4 2021, 7:12 AM

david-arm marked an inline comment as done.

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2455–2456	Yep, thanks for the suggestion. I originally left it as FixedVectorType so it would assert for scalable vectors, but you're right that we should assert this explicitly and use VectorType instead.

ctetreau added inline comments.Mar 4 2021, 8:20 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478	It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do. The docs say that this computes `Val + <StartIdx, StartIdx + Step, StartIdx + 2Step, ...>` But what this function does is: StartIdxSplat = <StartIdx, StartIdx, ...> InitVec = StartIdxSplat + <0, 1, ...> Step = <InputStep, InputStep, ...> result = Val + (InitVec Step) If the input step is 1, this seems to do the right thing: StartIdx = 2 // some non-1 start for illustrative purposes Val = <a, b, c, d> InputStep = 1 StartIdxSplat = <2, 2, 2, 2> InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5> Step = <1, 1, 1, 1> result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5> But if we try a larger input step: StartIdx = 2 // some non-1 start for illustrative purposes Val = <a, b, c, d> InputStep = 2 StartIdxSplat = <2, 2, 2, 2> InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5> Step = <2, 2, 2, 2> result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10> The label on the tin says that the first element of the RHS vector should be equal to `StartIdx`, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue. I believe we need to multiply the step vector by the step before adding the start value. I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.

ctetreau added inline comments.Mar 4 2021, 8:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478	I suppose another option is to document what the function actually does in the header

Harbormaster completed remote builds in B92068: Diff 328165.Mar 4 2021, 7:00 PM

Updated documentation for getStepVector

david-arm marked 2 inline comments as done.Mar 5 2021, 4:50 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478	Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function `@non_primary_iv_loop_inv_trunc` in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this: ; CHECK: [[TMP3:%.]] = trunc i64 %step to i32 ; CHECK-NEXT: [[DOTSPLATINSERT5:%.]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0 ; CHECK-NEXT: [[DOTSPLAT6:%.]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer ; CHECK-NEXT: [[TMP4:%.]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]] In this case I've tried to update the documentation to reflect this behaviour.

david-arm added a reviewer: bsmith.Mar 5 2021, 7:35 AM

david-arm marked an inline comment as done.

lgtm

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478	Yeah, I guess it checks out. Thanks for updating the doc string to be the actual computation. start = 0, step = 2, VF = 4, Value = zeroinitializer i = <2, 4, 6, 8> i_2 = <8, 8, 8, 8> + <2, 4, 6, 8> = <10, 12, 14, 16> i_3 = <8, 8, 8, 8> + <10, 12, 14, 16> = <18, 20, 22, 24> start = 2, step = 2, VF = 4, Value = zeroinitializer i = <4, 6, 8, 10> i_2 = <8, 8, 8, 8> + <4, 6, 8, 10> = <12, 14, 16, 18> i_3 = <8, 8, 8, 8> + <12, 14, 16, 18> = <20, 22, 24, 26>

This revision is now accepted and ready to land.Mar 5 2021, 12:26 PM

Harbormaster completed remote builds in B92281: Diff 328475.Mar 5 2021, 10:01 PM

david-arm added a child revision: D98715: [LoopVectorize] Add support for scalable vectorization of induction variables.Mar 16 2021, 8:47 AM

Closed by commit rGd70251163f71: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector (authored by david-arm). · Explain WhyMar 23 2021, 4:29 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGd70251163f71: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

53 lines

Diff 332614

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 604 Lines • ▼ Show 20 Lines protected:

/// Iteratively sink the scalarized operands of a predicated instruction into /// Iteratively sink the scalarized operands of a predicated instruction into

/// the block that was created for it. /// the block that was created for it.

void sinkScalarOperands(Instruction *PredInst); void sinkScalarOperands(Instruction *PredInst);

/// Shrinks vector element sizes to the smallest bitwidth they can be legally /// Shrinks vector element sizes to the smallest bitwidth they can be legally

/// represented as. /// represented as.

void truncateToMinimalBitwidths(VPTransformState &State); void truncateToMinimalBitwidths(VPTransformState &State);

/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...) /// This function adds

/// (StartIdx * Step, (StartIdx + 1) * Step, (StartIdx + 2) * Step, ...)

/// to each vector element of Val. The sequence starts at StartIndex. /// to each vector element of Val. The sequence starts at StartIndex.

/// \p Opcode is relevant for FP induction variable. /// \p Opcode is relevant for FP induction variable.

virtual Value *getStepVector(Value *Val, int StartIdx, Value *Step, virtual Value *getStepVector(Value *Val, int StartIdx, Value *Step,

Instruction::BinaryOps Opcode = Instruction::BinaryOps Opcode =

Instruction::BinaryOpsEnd); Instruction::BinaryOpsEnd);

/// Compute scalar induction steps. \p ScalarIV is the scalar induction /// Compute scalar induction steps. \p ScalarIV is the scalar induction

/// variable on which to base the steps, \p Step is the size of the step, and /// variable on which to base the steps, \p Step is the size of the step, and

▲ Show 20 Lines • Show All 1,824 Lines • ▼ Show 20 Lines void InnerLoopVectorizer::widenIntOrFpInduction(PHINode *IV, Value *Start,

Value *ScalarIV = CreateScalarIV(Step); Value *ScalarIV = CreateScalarIV(Step);

if (!Cost->isScalarEpilogueAllowed()) if (!Cost->isScalarEpilogueAllowed())

CreateSplatIV(ScalarIV, Step); CreateSplatIV(ScalarIV, Step);

buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, CastDef, State); buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, CastDef, State);

} }

Value *InnerLoopVectorizer::getStepVector(Value *Val, int StartIdx, Value *Step, Value *InnerLoopVectorizer::getStepVector(Value *Val, int StartIdx, Value *Step,

Instruction::BinaryOps BinOp) { Instruction::BinaryOps BinOp) {

// Create and check the types. // Create and check the types.

auto *ValVTy = cast<FixedVectorType>(Val->getType()); assert(isa<FixedVectorType>(Val->getType()) &&

ctetreauUnsubmitted

Done

// Create and check the types.

- auto *ValVTy = cast<FixedVectorType>(Val->getType());

+ assert(isa<FixedVectorType>(Val->getType()) && "Creation of scalable step vector not yet supported");

+ auto *ValVTy = cast<VectorType>(Val->getType());

ElementCount VLen = ValVTy->getElementCount();

NIT: Since the goal is to eventually support scalable vectors, this cast to FixedVectorType is counterproductive.

ctetreau: NIT: Since the goal is to eventually support scalable vectors, this cast to FixedVectorType is…

david-armAuthorUnsubmitted

Done

Yep, thanks for the suggestion. I originally left it as FixedVectorType so it would assert for scalable vectors, but you're right that we should assert this explicitly and use VectorType instead.

david-arm: Yep, thanks for the suggestion. I originally left it as FixedVectorType so it would assert for…

int VLen = ValVTy->getNumElements(); "Creation of scalable step vector not yet supported");

auto *ValVTy = cast<VectorType>(Val->getType());

ElementCount VLen = ValVTy->getElementCount();

Type *STy = Val->getType()->getScalarType(); Type *STy = Val->getType()->getScalarType();

assert((STy->isIntegerTy() || STy->isFloatingPointTy()) && assert((STy->isIntegerTy() || STy->isFloatingPointTy()) &&

"Induction Step must be an integer or FP"); "Induction Step must be an integer or FP");

assert(Step->getType() == STy && "Step has wrong type"); assert(Step->getType() == STy && "Step has wrong type");

SmallVector<Constant *, 8> Indices; SmallVector<Constant *, 8> Indices;

if (STy->isIntegerTy()) {

// Create a vector of consecutive numbers from zero to VF. // Create a vector of consecutive numbers from zero to VF.

for (int i = 0; i < VLen; ++i) VectorType *InitVecValVTy = ValVTy;

Indices.push_back(ConstantInt::get(STy, StartIdx + i)); Type *InitVecValSTy = STy;

if (STy->isFloatingPointTy()) {

InitVecValSTy =

IntegerType::get(STy->getContext(), STy->getScalarSizeInBits());

InitVecValVTy = VectorType::get(InitVecValSTy, VLen);

}

Value *InitVec = Builder.CreateStepVector(InitVecValVTy);

// Add on StartIdx

ctetreauUnsubmitted

Done

It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do.

The docs say that this computes Val + <StartIdx, StartIdx + Step, StartIdx + 2*Step, ...>

But what this function does is:

StartIdxSplat = <StartIdx, StartIdx, ...>
InitVec = StartIdxSplat + <0, 1, ...>
Step = <InputStep, InputStep, ...>
result = Val + (InitVec * Step)

If the input step is 1, this seems to do the right thing:

StartIdx = 2 // some non-1 start for illustrative purposes
Val = <a, b, c, d>
InputStep = 1

StartIdxSplat = <2, 2, 2, 2>
InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
Step = <1, 1, 1, 1>
result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5>

But if we try a larger input step:

StartIdx = 2 // some non-1 start for illustrative purposes
Val = <a, b, c, d>
InputStep = 2

StartIdxSplat = <2, 2, 2, 2>
InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
Step = <2, 2, 2, 2>
result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10>

The label on the tin says that the first element of the RHS vector should be equal to StartIdx, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue.

I believe we need to multiply the step vector by the step before adding the start value.

I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.

ctetreau: It seems like this does the same thing as the original version. However, I don't think this…

ctetreauUnsubmitted

Done

I suppose another option is to document what the function actually does in the header

ctetreau: I suppose another option is to document what the function actually does in the header

david-armAuthorUnsubmitted

Done

Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function @non_primary_iv_loop_inv_trunc in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this:

; CHECK:         [[TMP3:%.*]] = trunc i64 %step to i32
; CHECK-NEXT:    [[DOTSPLATINSERT5:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; CHECK-NEXT:    [[DOTSPLAT6:%.*]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer
; CHECK-NEXT:    [[TMP4:%.*]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]]

In this case I've tried to update the documentation to reflect this behaviour.

david-arm: Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we…

ctetreauUnsubmitted

Not Done

Yeah, I guess it checks out. Thanks for updating the doc string to be the actual computation.

start = 0, step = 2, VF = 4, Value = zeroinitializer

i                                     = <2, 4, 6, 8>
i_2 = <8, 8, 8, 8> + <2, 4, 6, 8>     = <10, 12, 14, 16>
i_3 = <8, 8, 8, 8> + <10, 12, 14, 16> = <18, 20, 22, 24>

start = 2, step = 2, VF = 4, Value = zeroinitializer

i                                     = <4, 6, 8, 10>
i_2 = <8, 8, 8, 8> + <4, 6, 8, 10>    = <12, 14, 16, 18>
i_3 = <8, 8, 8, 8> + <12, 14, 16, 18> = <20, 22, 24, 26>

ctetreau: Yeah, I guess it checks out. Thanks for updating the doc string to be the actual computation.

Value *StartIdxSplat = Builder.CreateVectorSplat(

VLen, ConstantInt::get(InitVecValSTy, StartIdx));

InitVec = Builder.CreateAdd(InitVec, StartIdxSplat);

// Add the consecutive indices to the vector value. if (STy->isIntegerTy()) {

Constant *Cv = ConstantVector::get(Indices);

assert(Cv->getType() == Val->getType() && "Invalid consecutive vec");

Step = Builder.CreateVectorSplat(VLen, Step); Step = Builder.CreateVectorSplat(VLen, Step);

assert(Step->getType() == Val->getType() && "Invalid step vec"); assert(Step->getType() == Val->getType() && "Invalid step vec");

// FIXME: The newly created binary instructions should contain nsw/nuw flags, // FIXME: The newly created binary instructions should contain nsw/nuw flags,

// which can be found from the original scalar operations. // which can be found from the original scalar operations.

Step = Builder.CreateMul(Cv, Step); Step = Builder.CreateMul(InitVec, Step);

return Builder.CreateAdd(Val, Step, "induction"); return Builder.CreateAdd(Val, Step, "induction");

} }

// Floating point induction. // Floating point induction.

assert((BinOp == Instruction::FAdd || BinOp == Instruction::FSub) && assert((BinOp == Instruction::FAdd || BinOp == Instruction::FSub) &&

"Binary Opcode should be specified for FP induction"); "Binary Opcode should be specified for FP induction");

// Create a vector of consecutive numbers from zero to VF. InitVec = Builder.CreateUIToFP(InitVec, ValVTy);

for (int i = 0; i < VLen; ++i)

Indices.push_back(ConstantFP::get(STy, (double)(StartIdx + i)));

// Add the consecutive indices to the vector value.

// Floating-point operations inherit FMF via the builder's flags.

Constant *Cv = ConstantVector::get(Indices);

Step = Builder.CreateVectorSplat(VLen, Step); Step = Builder.CreateVectorSplat(VLen, Step);

Value *MulOp = Builder.CreateFMul(Cv, Step); Value *MulOp = Builder.CreateFMul(InitVec, Step);

return Builder.CreateBinOp(BinOp, Val, MulOp, "induction"); return Builder.CreateBinOp(BinOp, Val, MulOp, "induction");

} }

void InnerLoopVectorizer::buildScalarSteps(Value *ScalarIV, Value *Step, void InnerLoopVectorizer::buildScalarSteps(Value *ScalarIV, Value *Step,

Instruction *EntryVal, Instruction *EntryVal,

const InductionDescriptor &ID, const InductionDescriptor &ID,

VPValue *Def, VPValue *CastDef, VPValue *Def, VPValue *CastDef,

VPTransformState &State) { VPTransformState &State) {

▲ Show 20 Lines • Show All 2,260 Lines • ▼ Show 20 Lines Value *InductionGEP = GetElementPtrInst::Create(

ConstantInt::get(PhiType, State.VF.getKnownMinValue() * State.UF)), ConstantInt::get(PhiType, State.VF.getKnownMinValue() * State.UF)),

"ptr.ind", InductionLoc); "ptr.ind", InductionLoc);

NewPointerPhi->addIncoming(InductionGEP, LoopLatch); NewPointerPhi->addIncoming(InductionGEP, LoopLatch);

// Create UF many actual address geps that use the pointer // Create UF many actual address geps that use the pointer

// phi as base and a vectorized version of the step value // phi as base and a vectorized version of the step value

// (<step*0, ..., step*N>) as offset. // (<step*0, ..., step*N>) as offset.

for (unsigned Part = 0; Part < State.UF; ++Part) { for (unsigned Part = 0; Part < State.UF; ++Part) {

SmallVector<Constant *, 8> Indices; Type *VecPhiType = VectorType::get(PhiType, State.VF);

Value *StartOffset =

ConstantInt::get(VecPhiType, Part * State.VF.getKnownMinValue());

// Create a vector of consecutive numbers from zero to VF. // Create a vector of consecutive numbers from zero to VF.

for (unsigned i = 0; i < State.VF.getKnownMinValue(); ++i) StartOffset =

Indices.push_back( Builder.CreateAdd(StartOffset, Builder.CreateStepVector(VecPhiType));

ConstantInt::get(PhiType, i + Part * State.VF.getKnownMinValue()));

Constant *StartOffset = ConstantVector::get(Indices);

Value *GEP = Builder.CreateGEP( Value *GEP = Builder.CreateGEP(

ScStValueType->getPointerElementType(), NewPointerPhi, ScStValueType->getPointerElementType(), NewPointerPhi,

Builder.CreateMul(StartOffset, Builder.CreateMul(StartOffset,

Builder.CreateVectorSplat( Builder.CreateVectorSplat(

State.VF.getKnownMinValue(), ScalarStepValue), State.VF.getKnownMinValue(), ScalarStepValue),

"vector.gep")); "vector.gep"));

State.set(Def, GEP, Part); State.set(Def, GEP, Part);

▲ Show 20 Lines • Show All 5,158 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 332614

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector
ClosedPublic