This is an archive of the discontinued LLVM Phabricator instance.

[LV] Don't emit the Vector IV if it is not used.
Needs ReviewPublic

Authored by SjoerdMeijer on Apr 17 2020, 3:38 AM.

Details

Summary

The vector induction variable (IV) is created quite early in the pipeline, but may end not being used, and this change avoids emitting the vector IV if that is the case. The scalar IV is splatted into a vector with a code sequence like this:

[[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[INDEX]], i32 0
[[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32>
[[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer

We save these instructions to a list where they are being created, and remove them at the end in fixVectorizedLoop where similar fixes and clean ups are performed.

The obvious benefit of this simple admin job is that it makes the LV's output more compact and readable, this unused code can be confusing to look at, and other cleanup passes have to work a little bit less hard.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Apr 17 2020, 3:38 AM
Ayal added inline comments.Apr 17 2020, 3:59 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1890

This doesn't really broadcast when VF<=1 so should be fine?

1920

Would it be possible to avoid generating the undesired broadcasting by wrapping this CreateSplatIV() under an "if (needsVectorInduction(EntryVal))" that checks if any user !shouldScalarizeInstruction(), analogous to needsScalarInduction(EntryVal) above?

Better avoid generating and carrying potential garbage to begin with if it's easy; or potentially augment cse() clean-up at the end.

SjoerdMeijer marked an inline comment as done.Apr 19 2020, 12:19 PM
SjoerdMeijer added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1920

Would it be possible to avoid generating the undesired broadcasting by wrapping this CreateSplatIV() under an "if (needsVectorInduction(EntryVal))" that checks if any user !shouldScalarizeInstruction(), analogous to needsScalarInduction(EntryVal) above?

I agree that this would be nicer, but it looks like needsScalarInduction is an easier case as it simply looks at orginal loop instructions and see if they need to be scalarised. For the possibly new needsVectorInduction variant, I think we would need to look at the uses of the vector IV in the vector code, which hasn't been created yet. And since we need to look at the uses, which is what this patch is doing, I thought doing this afterwards in the fixup routine was a simple and reasonable compromise. For example, I see cases of masked/load stores using the vector IV, and I don't think I have all the information at this point to check all uses.

Better avoid generating and carrying potential garbage to begin with if it's easy; or potentially augment cse() clean-up at the end.

As I wrote above, I am not sure if it is easier or even possible to avoid generating the dead code. But I will have a look again to see if I haven't missed anything. If this is not possible, I understand your preference is to have some form of DCE running here? I thought about that, but wasn't sure if that would be overkill for just removing the vectorIV...

Ayal added inline comments.Apr 25 2020, 4:32 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1920

This indeed deserves some cleaning up.

The current decision is to vectorize (vs. scalarize) an IV if it or its update have any immediate user that will be vectorized; see documentation noted below. That decision should be aware that fold-mask will add an immediate vector user to the primary induction; suggest to fix that below.

The claim is that (then), this last default case of widenIntOrFpInduction() where EntryVal is scalarized, should not call CreateSplatIV() at all, as EntryVal must have no vector users.

Note that users first marked vector (!isScalarAfterVectorization; causing IV to be marked vector) may later be found to be isProfitableToScalarize; so IV might end up being vectorized with no vector users. But the converse cannot hold -- an initially marked vector IV cannot later be found isProfitableToScalarize because the latter applies to instructions in predicated blocks only.

An alternative approach, if/when supporting splatting scalar IV's into vectors is desired, may be to introduce them lazily, along with the scalar-to-vector broadcasts of ILV's getOrCreateVectorValue(). Going forward though, all such scalars-to-vector/vector-to-scalars should be modeled explicitly, during VPlan construction, rather than during a recipe's execution.

ILV does employ a self-cleaning cse() post-pass. But going forward, better have things cleaned up in VPlan.

4604

Note the above comment.
If IV has a user that does not remain scalar, IV is not to remain scalar.

4616

Should add here:

// Primary induction will be used to feed a vector compare under fold mask.
if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())
  continue;
SjoerdMeijer marked an inline comment as done.Apr 27 2020, 1:04 AM
SjoerdMeijer added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1920

Thanks Ayal for sharing and explaining this! I am looking into this now.

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2020, 1:04 AM