This is an archive of the discontinued LLVM Phabricator instance.

[LV] Generate both scalar and vector integer induction variables
ClosedPublic

Authored by mssimpso on Jul 27 2016, 8:30 AM.

Details

Summary

This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 65746.Jul 27 2016, 8:30 AM
mssimpso retitled this revision from to [LV] Generate both scalar and vector integer induction variables.
mssimpso updated this object.
mssimpso added reviewers: anemet, mkuper.
mssimpso added subscribers: llvm-commits, mcrosier.
mssimpso updated this revision to Diff 65949.Jul 28 2016, 9:24 AM

Removed unnecessary cast.

mkuper edited edge metadata.Jul 28 2016, 1:15 PM

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

lib/Transforms/Vectorize/LoopVectorize.cpp
1976

Why "Int"? I mean, it may be called only from widenIntInduction now, but I don't really see a reason to bake it into the name, the logic is independent of the IV type.

Regardless, I find the use of "scalarize" sort of confusing. It evokes the notion of constructing a vector, and then extracting the scalars, which is the opposite of what we're doing. Maybe something like needsScalarInduction, and then a matching name for the variable below?
(If you find the current terminology clearer, I won't insist on this, this could be just my own bias.)

1980

Why do we need !isLoopInvariant(U)? If the use is loop invariant, shouldn't it necessarily be a scalar?

(Assuming we do need it, why two separate ifs, and not a && ?)

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

If you don't mind looking at the full code, I've pasted below what we generate for the new test case I've added to induction.ll (VF=2, UF=2, after instcombine).

Without this patch:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
  %step.add = add <2 x i64> %vec.ind, <i64 2, i64 2>
  %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
  %3 = add <2 x i32> %broadcast.splat, %vec.ind2
  %4 = add <2 x i32> %broadcast.splat, %step.add3
  %5 = trunc <2 x i32> %3 to <2 x i16>
  %6 = trunc <2 x i32> %4 to <2 x i16>
  %7 = extractelement <2 x i64> %vec.ind, i32 0
  %8 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %7, i32 1
  %9 = extractelement <2 x i64> %vec.ind, i32 1
  %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %9, i32 1
  %11 = extractelement <2 x i64> %step.add, i32 0
  %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %11, i32 1
  %13 = extractelement <2 x i64> %step.add, i32 1
  %14 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %13, i32 1
  %15 = extractelement <2 x i16> %5, i32 0
  store i16 %15, i16* %8, align 2
  %16 = extractelement <2 x i16> %5, i32 1
  store i16 %16, i16* %10, align 2
  %17 = extractelement <2 x i16> %6, i32 0
  store i16 %17, i16* %12, align 2
  %18 = extractelement <2 x i16> %6, i32 1
  store i16 %18, i16* %14, align 2
  %index.next = add i64 %index, 4
  %vec.ind.next = add <2 x i64> %vec.ind, <i64 4, i64 4>
  %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
  %19 = icmp eq i64 %index.next, %n.vec
  br i1 %19, label %middle.block, label %vector.body, !llvm.loop !0

With this patch:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
  %3 = or i64 %index, 1
  %4 = or i64 %index, 2
  %5 = or i64 %index, 3
  %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
  %6 = add <2 x i32> %broadcast.splat, %vec.ind2
  %7 = add <2 x i32> %broadcast.splat, %step.add3
  %8 = trunc <2 x i32> %6 to <2 x i16>
  %9 = trunc <2 x i32> %7 to <2 x i16>
  %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %index, i32 1
  %11 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %3, i32 1
  %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %4, i32 1
  %13 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %5, i32 1
  %14 = extractelement <2 x i16> %8, i32 0
  store i16 %14, i16* %10, align 2
  %15 = extractelement <2 x i16> %8, i32 1
  store i16 %15, i16* %11, align 2
  %16 = extractelement <2 x i16> %9, i32 0
  store i16 %16, i16* %12, align 2
  %17 = extractelement <2 x i16> %9, i32 1
  store i16 %17, i16* %13, align 2
  %index.next = add i64 %index, 4
  %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
  %18 = icmp eq i64 %index.next, %n.vec
  br i1 %18, label %middle.block, label %vector.body, !llvm.loop !0
lib/Transforms/Vectorize/LoopVectorize.cpp
1976

I don't feel that strongly about the name, so I'm fine with this.

1980

isScalarAfterVectorization returns true for values or instructions not in the loop. Using !isLoopInVariant ensures we're looking at an instruction in the loop that will be scalar. With the &&, we would get a clang-format line break, so I kept it as two if's because I thought it might be easier to read. Let me know if you prefer one over the other.

mssimpso updated this revision to Diff 65986.Jul 28 2016, 1:48 PM
mssimpso edited edge metadata.

Addressed Michael's comments.

  • Rebased.
  • Renamed "shouldScalarizeIntInduciton" to "needsScalarInduction"
mssimpso marked an inline comment as done.Jul 28 2016, 2:04 PM

Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
It seems like it should be better, I'm wondering how much.

If you don't mind looking at the full code, I've pasted below what we generate for the new test case I've added to induction.ll (VF=2, UF=2, after instcombine).

Yes, that does look much better, thanks!
(I actually meant post-CG, but the IR looks so much better that doesn't really matter).

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986

Oh, so if your only scalar users are outside the loop, then you don't want to generate a scalar IV. Makes sense.
So you still end up with an extract from the vector IV, right?

Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)?

mssimpso added inline comments.Jul 28 2016, 2:36 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986

So you still end up with an extract from the vector IV, right?

I don't think so. If there is a user of the induction variable outside the loop (but no scalar users in the loop), yes, the induction variable would remain vector. The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable.

Anyway, if all what you want to check is whether the user is in the loop, why are you checking isLoopInvariant and not Contains (like isScalarAfterVectorization does)?

isLoopInvariant does the additional check that the value is an instruction. But I'm thinking we should just cast the user to an instruction here and use Contains. That should make this more clear. I'll update the patch.

mkuper accepted this revision.Jul 28 2016, 2:39 PM
mkuper edited edge metadata.

LGTM after the Invariant -> Contains change.

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986

The external induction variable users are handled in fixupIVUsers (this was recently added I think). And I think the fixup's are based on the canonical induction variable.

Yes, I've added that, I just wanted to make sure this doesn't interfere, and was too lazy to look up what I actually did. :-)
Anyway, you're right, we'll always have the canonical IV as scalar, so it's fine.

This revision is now accepted and ready to land.Jul 28 2016, 2:39 PM
mssimpso updated this revision to Diff 66012.Jul 28 2016, 2:45 PM
mssimpso edited edge metadata.

Use contains instead of isLoopInvariant.

Thanks for the review, Michael! I'll wait to commit until after D22867 is approved.

anemet edited edge metadata.Jul 29 2016, 10:25 AM

Similarly to other patch, it would be good to describe the changed functionality in this patch -- you only describe the end state. (Sorry to keep banging on this but given the complexity I think it's crucial to keep a good a record. In generally, it's also a good idea to have the description match the proposed commit log. If you use arcanist, passing --verbatim to arc diff supports this flow. )

Essentially the part that is missing is the before picture which I think is this. We used to use a scalar IV *if and only if* there was no need for a vector IV. Otherwise we would still use the vector IV and extract the scalar value from it. With this new patch we can now generate both and use either variant of the IV depending on the user. Correct?

lib/Transforms/Vectorize/LoopVectorize.cpp
1979–1982

Can we do this with std::any_of?

2007

Why are you calling this 'EntryVal'?

Adam,

That's exactly right. I'll be sure to use that language in the commit log. Echoing what you said for the record: Previously, we only used the scalar IV if we knew all users were going to be scalar. In the case of a loop with both scalar and vector users, we would generate the vector IV and extract scalar values from it for the scalar users. The functional change here is that now when there are both scalar and vector users, we generate both versions of the IV and select which version to use based on whether the user is scalar or vector.

lib/Transforms/Vectorize/LoopVectorize.cpp
1975–1986

Yes, I think so! I'll update the patch.

2007

It's the entry value in WidenMap. For the truncation case, the truncate instruction maps to the new narrow IV. I'm happy to rename this though. I probably went with EntryVal because I couldn't think of anything better at the time.

mssimpso updated this revision to Diff 66167.Jul 29 2016, 1:34 PM
mssimpso edited edge metadata.

Addressed Adam's comments.

  • Rebased.
  • Used any_of in needsScalarInduction. I also simplified this a bit by removing the explicit check for the IV itself. At the very least, if the IV is scalar, it's update instruction will also be scalar.
mssimpso updated this revision to Diff 66329.Aug 1 2016, 8:44 AM
mssimpso updated this object.
  • Made needsScalarInduction more readable.
  • Updated summary to reflect functional changes.
anemet added inline comments.Aug 1 2016, 11:03 AM
test/Transforms/LoopVectorize/induction.ll
604–678

I could be wrong, but now this test does not seem to test what it was meant for. I thought the point was to ensure that most of the work to get the vector IV set up is pushed into the preheader. But now it seems that we no longer generate that?

mssimpso added inline comments.Aug 1 2016, 11:20 AM
test/Transforms/LoopVectorize/induction.ll
604–678

Yeah, I think you're right. With the current patch, the vector IV is complete removed after instcombine. We generate both a scalar one and a vector one (because of the store) during vectorization. But because the store is scalarized, instcombine is able to remove the vector IV.

If we add a pre-instcombine check for this test, we could check the original functionality as well. What do you think?

anemet added inline comments.Aug 1 2016, 11:34 AM
test/Transforms/LoopVectorize/induction.ll
604–678

Ah, I didn't see that this was a non-consecutive store. What if you make it consecutive to avoid the store to get scalarized (the non-zero based IV would still require us to create a new IV hopefully)?

mssimpso added inline comments.Aug 1 2016, 11:40 AM
test/Transforms/LoopVectorize/induction.ll
604–678

That will work -- we'll be left with both a scalar and a vector IV after instcombine. I'll update the patch. Thanks!

mssimpso updated this revision to Diff 66352.Aug 1 2016, 12:11 PM

Addressed Adam's comments.

  • Made store consecutive in test case and added a pre-instcombine check. We now generate both scalar and vector IVs for this test.
anemet accepted this revision.Aug 1 2016, 12:30 PM
anemet edited edge metadata.

LGTM!

This revision was automatically updated to reflect the committed changes.