Page MenuHomePhabricator

[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector
ClosedPublic

Authored by david-arm on Mar 3 2021, 7:54 AM.

Details

Summary

In places where we create a ConstantVector whose elements are a
linear sequence of the form <start, start + 1, start + 2, ...>
I've changed the code to make use of CreateStepVector, which creates
a vector with the sequence <0, 1, 2, ...>, and a vector addition
operation. This patch is a non-functional change, since the output
from the vectoriser remains unchanged for fixed length vectors and
there are existing asserts that still fire when attempting to use
scalable vectors for vectorising induction variables.

In a later patch we will enable support for scalable vectors
in InnerLoopVectorizer::getStepVector(), which relies upon the new
stepvector intrinsic in IRBuilder::CreateStepVector.

Diff Detail

Event Timeline

david-arm created this revision.Mar 3 2021, 7:54 AM
david-arm requested review of this revision.Mar 3 2021, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2021, 7:54 AM
ctetreau added inline comments.Mar 3 2021, 9:56 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2455–2456

NIT: Since the goal is to eventually support scalable vectors, this cast to FixedVectorType is counterproductive.

david-arm updated this revision to Diff 328165.Mar 4 2021, 7:12 AM
david-arm marked an inline comment as done.
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2455–2456

Yep, thanks for the suggestion. I originally left it as FixedVectorType so it would assert for scalable vectors, but you're right that we should assert this explicitly and use VectorType instead.

ctetreau added inline comments.Mar 4 2021, 8:20 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478

It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do.

The docs say that this computes Val + <StartIdx, StartIdx + Step, StartIdx + 2*Step, ...>

But what this function does is:

StartIdxSplat = <StartIdx, StartIdx, ...>
InitVec = StartIdxSplat + <0, 1, ...>
Step = <InputStep, InputStep, ...>
result = Val + (InitVec * Step)

If the input step is 1, this seems to do the right thing:

StartIdx = 2 // some non-1 start for illustrative purposes
Val = <a, b, c, d>
InputStep = 1

StartIdxSplat = <2, 2, 2, 2>
InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
Step = <1, 1, 1, 1>
result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5>

But if we try a larger input step:

StartIdx = 2 // some non-1 start for illustrative purposes
Val = <a, b, c, d>
InputStep = 2

StartIdxSplat = <2, 2, 2, 2>
InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
Step = <2, 2, 2, 2>
result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10>

The label on the tin says that the first element of the RHS vector should be equal to StartIdx, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue.

I believe we need to multiply the step vector by the step before adding the start value.

I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.

ctetreau added inline comments.Mar 4 2021, 8:44 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478

I suppose another option is to document what the function actually does in the header

david-arm updated this revision to Diff 328475.Mar 5 2021, 4:46 AM
david-arm marked an inline comment as done.
  • Updated documentation for getStepVector
david-arm marked 2 inline comments as done.Mar 5 2021, 4:50 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478

Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function @non_primary_iv_loop_inv_trunc in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this:

; CHECK:         [[TMP3:%.*]] = trunc i64 %step to i32
; CHECK-NEXT:    [[DOTSPLATINSERT5:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; CHECK-NEXT:    [[DOTSPLAT6:%.*]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer
; CHECK-NEXT:    [[TMP4:%.*]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]]

In this case I've tried to update the documentation to reflect this behaviour.

david-arm marked an inline comment as done.
ctetreau accepted this revision.Mar 5 2021, 12:26 PM

lgtm

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2478

Yeah, I guess it checks out. Thanks for updating the doc string to be the actual computation.

start = 0, step = 2, VF = 4, Value = zeroinitializer

i                                     = <2, 4, 6, 8>
i_2 = <8, 8, 8, 8> + <2, 4, 6, 8>     = <10, 12, 14, 16>
i_3 = <8, 8, 8, 8> + <10, 12, 14, 16> = <18, 20, 22, 24>

start = 2, step = 2, VF = 4, Value = zeroinitializer

i                                     = <4, 6, 8, 10>
i_2 = <8, 8, 8, 8> + <4, 6, 8, 10>    = <12, 14, 16, 18>
i_3 = <8, 8, 8, 8> + <12, 14, 16, 18> = <20, 22, 24, 26>
This revision is now accepted and ready to land.Mar 5 2021, 12:26 PM