This is an archive of the discontinued LLVM Phabricator instance.

[LV] Remove dead IV casts using VPlan (NFC).
ClosedPublic

Authored by fhahn on Dec 5 2021, 5:02 AM.

Details

Summary

This patch simplifies handling of dead induction casts, by removing dead
cast instructions after initial VPlan construction. This has the
following benefits:

  1. fixes a crash (see @test_optimized_cast_induction_feeding_first_order_recurrence)
  2. Simplifies VPWidenIntOrFpInduction to a single-def recipes
  3. Retires recordVectorLoopValueForInductionCast.

Diff Detail

Event Timeline

fhahn created this revision.Dec 5 2021, 5:02 AM
fhahn requested review of this revision.Dec 5 2021, 5:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2021, 5:02 AM
Herald added a subscriber: vkmr. · View Herald Transcript
Ayal added a comment.Dec 6 2021, 3:15 AM

Marvelous!
Worth adding in the summary that all redundant casts will now first be represented in VPlan using recipes, instead of being treated as DeadInstructions and rejected out of hand.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
661

Worth retaining some of this explanation, somewhere?

9087

dead >> redundant?

9104

Would be good to ensure that all members of Casts get processed.
How about recording the recipes of Casts upfront, and traverse Casts here retrieving their recipes directly and RAUW them with IV?

Note that this Redundant Cast Elimination can-be/is done before sinkAfter, because such casts can feed only themselves, so they cannot feed candidates that want to sinkAfter them. However, perhaps better to swap them, given that sinkAfter is mandatory whereas RCE is an optimization - presumably code is functionally correct (yet suboptimal) w/o it? Worth outlining to reduce the size of buildVPlanWithVPRecipes() which is already excessive?

9112

Now that the redundant casts became dead - useless, they can be cleaned up with a general VPlan DCE, which may become useful in additional cases. This local self-clean-up is also fine of-course.

fhahn updated this revision to Diff 392102.Dec 6 2021, 9:34 AM

Move transform to VPlanTransforms.cpp, run after sink-after, retain essence of comment.

fhahn updated this revision to Diff 392104.Dec 6 2021, 9:38 AM
fhahn marked 3 inline comments as done.

remove stray newline

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
661

I tried to capture the essence (the bit about why the casts are redundant) in the doc-comment for removeRedundantInductionCasts.

9104

Would be good to ensure that all members of Casts get processed.

I added an assert that we add exactly Casts.size() elements to remove.

How about recording the recipes of Casts upfront, and traverse Casts here retrieving their recipes directly and RAUW them with IV?

I moved it to VPlanTransforms, but retained the logic to discover the casts be traversing IV users. I am not sure if it is worth relying on the extra state in this case, we can get all the information directly from the plan without too much effort. We only need to look through instructions in Casts, so it should be a very small amount of extra work. WDYT?

9112

There's no VPlan DCE yet unfortunately, so I kept it for now. But that's certainly something to follow up on.

Ayal added inline comments.Dec 6 2021, 1:52 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9104

I moved it to VPlanTransforms, but retained the logic to discover the casts be traversing IV users. I am not sure if it is worth relying on the extra state in this case, we can get all the information directly from the plan without too much effort. We only need to look through instructions in Casts, so it should be a very small amount of extra work. WDYT?

Doing

for (auto &Cast : Casts)
  RecipeBuilder.getRecipe(Cast)->getVPSingleValue()->replaceAllUsesWith(IV);

does seem (more) straightforward, albeit dependent on recording Casts in RecipeBuilder, and does not convey nor check that Casts are chained from IV.
Amount of extra work is indeed expected to be small, and can be slightly decreased - see below. Furthermore, perhaps at some point this could become a truly VPlan-to-VPlan transformation, which also identifies the redundancy of such casts independently, instead of relying on initial external identification. Ship it!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
308

Used only in assert, needs #ifndef?

313

(This cannot loop forever because Casts excludes the header phi chaining Casts into a cycle.)

There typically will be a few Casts if any, but it does seem somewhat wasteful to go through all users of all Casts. Perhaps loop until the last Cast is reached, saving the redundant traversal over its users? (And reducing the risk of infinite loop if something went wrong?)
Can also repeatedly search for a single next cast among the users of the current CastDef, stopping as soon as one is found, instead of always appending all users for traversal.

fhahn updated this revision to Diff 392416.Dec 7 2021, 8:13 AM
fhahn marked 2 inline comments as done.

Adjust loop to exit once all casts have been seen.

Note that the patch depends on D115111 which adds InductionDecriptor to VPWidenIntOrFpInductionRecipe.

fhahn added inline comments.Dec 7 2021, 8:15 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
313

I adjusted the loop to keep track of the number of casts already removed in the loop and only iterate until NumDeadCasts == Casts.size(). That should ensure that we bail out as soon as all casts have been visited, without adding the need for an extra bailout.

It also removes the need for the assertion below I think, because this is now guaranteed by the loop directly. WDYT?

Ayal added inline comments.Dec 9 2021, 5:31 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8060

Wonder if PrimaryInduction can have redundant casts; and if so, do they get removed now and/or with this patch?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
305

If IV has a TruncInst all of its Casts are to be retained?

308

"in Cast[s]"

313

This is fine, thanks!

Nits:
"while" >> "for", even if the bump is left inside?
In general better pre-compute Casts.size(), but loop is expected to be very short.
Another way of doing this, anticipating/enforcing a chain of casts:

VPRecipeBase *FindMyCast = IV;
for (unsigned NumCastsToRemove =  Casts.size(); NumCastsToRemove > 0; NumCastsToRemove--) {
  VPRecipeBase *FoundUserCast = nullptr;
  for (auto &UserCast : FindMyCast->Users)
    if (UserCast.getNumDefinedValues() == 1 &&
        is_contained(Casts, UserCast.getVPSingleValue()->getUnderlyingValue())) {
      FoundUserCast = &UserCast
      break;
    }
  assert(FoundUserCast && "Missing a cast to remove");
  FoundUserCast->getVPSingleValue()->replaceAllUsesWith(IV);
  DeadCasts.push_back(FoundUserCast);
  FindMyCast = FoundUserCast;
}
fhahn updated this revision to Diff 393166.Dec 9 2021, 7:51 AM

Adjust code as suggested, thanks!

fhahn marked 4 inline comments as done.Dec 9 2021, 8:18 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8060

I tried to construct such a test case, but I don't think that's possible at the moment. We require the trip-count to be computable to start with, but when the primary induction variable has casts, I don't think that would be the case. At least I couldn't construct a test. In any case, I think it would probably crash with the code currently in tree and would be handled correctly with the patch.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
305

For now, the TruncInst handling is completely separate tat the moment; If the Trunc has not been removed as dead, we will create a new (optimized) VPWidenIntOrFpInductionRecipe for the TruncInst, so there is nothing to remove here.

313

Thanks, I updated the code. With that style, we cannot RAUW in the loop, because this will nuke the users of FoundUserCast. I slightly adjusted the code so that DeadCasts also RAUW. Note that this required to adjust main loop a bit, now using the number of phi recipes. The phis() range relies in the first-non-phi iterator, which may be invalidated by removing recipes.

There's also no need to use is_contained. The instructions in Casts are in reverse visitation order and we can use that.

Ayal added inline comments.Dec 9 2021, 10:04 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
313

There's also no need to use is_contained. The instructions in Casts are in reverse visitation order and we can use that.

In this case, how about iterating over Casts in reverse order, searching for the recipe of each Cast (instead of recording and retrieving it directly as raised earlier).

If DeadCasts ("CastsToRemove"?) accumulates all candidate recipes and these are removed in a separate loop at the end, the main loop can more simply iterate over phis. The RAUW loop can loop over the last Casts.size() entries of DeadCasts (in reverse? ;-), or a separate CastsToRAUW can be used which can fill CastsToRemove; or a <Recipe, IV> map can be used to also delay RAUW to the end.

321

Nit: the following reverse order more clearly coveys that checking a single defined value guards getVPSingleValue:

if (UserCast->getNumDefinedValues() == 1 &&
    UserCast->getVPSingleValue()->getUnderlyingValue()) ==
        Casts[NumCastsToRemove - 1])
fhahn updated this revision to Diff 393211.Dec 9 2021, 10:19 AM
fhahn marked 3 inline comments as done.

Updated to use SmallVector<std::pair<VPRecipeBase *, VPValue *>> CastsToRemove to keep phi iteration simpler.

fhahn updated this revision to Diff 393214.Dec 9 2021, 10:25 AM
fhahn marked an inline comment as done.

Changed to iterate over Casts vector instead of count the number of casts.

fhahn marked an inline comment as done.Dec 9 2021, 10:27 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
313

I updated the code to iterate over the Casts vector instead of counting the casts we removed.

I also CastsToRemove to use SmallVector<std::pair<VPRecipeBase *, VPValue *>> and got back to the simpler iterating over phis.

321

Thanks, reordered conditions.

Ayal accepted this revision.Dec 9 2021, 11:04 AM

This looks good to me, thanks!

This revision is now accepted and ready to land.Dec 9 2021, 11:04 AM
Ayal added inline comments.Dec 10 2021, 1:15 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
305

It seems a bit more logical then to have "if (!IV || IV->getTruncInst()) continue", thereby indicating which recipes are (ir)relevant for this transformation.

If an IV with a TruncInst can have Casts, are their recipes removed/not-generated elsewhere? Otherwise, can we assert that there's no TruncInst after we checked that there are Casts?

fhahn updated this revision to Diff 393409.Dec 10 2021, 2:11 AM
fhahn marked an inline comment as done.

Adjust checks as suggested.

fhahn marked an inline comment as done.Dec 10 2021, 2:12 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
305

Thanks, the latest version indeed doesn't really need to check the size of Casts any more, because it directly iterates over them, so there's nothing extra to do it empty. Adjusted!

This revision was landed with ongoing or failed builds.Dec 10 2021, 5:57 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
Ayal added inline comments.Dec 12 2021, 11:53 PM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
305

there's nothing extra to do it empty. Adjusted!

Good!

If the Trunc has not been removed as dead, we will create a new (optimized) VPWidenIntOrFpInductionRecipe for the TruncInst, so there is nothing to remove here.

ok, agreed. Would be good to add a line explaining why we must bail-out when IV has a Trunc. An IV phi feeding one or more Truncs will have multiple VPWidenIntOrFpInductionRecipes associated with it; its redundant casts however need to be removed exactly once - e.g., when reaching the recipe associated directly with it, having no Trunc, as done here.
(pr35773.ll demonstrates this by failing due to missing the casts to remove, if this bail-out is removed.)

Perhaps this Trunc-of-an-IV optimization should also be done as a VPlan-to-VPlan transformation, placed after Redundant Casts Elimination (which originated in D38948). It goes back to e266efb70bfdd which moved truncation of a vectorized IV to operate on the scalar IV.

325

Note that this order of traversal, going forward from the phi, is the more efficient one for RAUW'ing.
Going backwards towards the phi, following the original order of Casts, would result in repeated RAUW's.