Page MenuHomePhabricator

[VPlan] Replace VPWidenRecipe with VPInstruction (WIP).
Changes PlannedPublic

Authored by fhahn on Feb 16 2020, 10:00 AM.

Details

Summary

This patch replaces VPWidenRecipe with VPInstructions. This is still WIP
and needs to handle a few additional cases, but I wanted to share this
early to make sure the direction makes sense.

With the current version, we get

Expected Passes    : 286
Unexpected Failures: 63

I introduced a new VPWidenInstruction, but maybe it would be better to
just use VPInstruction.

If the direction looks right, I'll also continue with other recipes.

Diff Detail

Unit TestsFailed

TimeTest
1,010 msLLVM-Unit.Transforms/Vectorize/_/VectorizeTests::VPlanHCFGTest.testBuildHCFGInnerLoop
Note: Google Test filter = VPlanHCFGTest.testBuildHCFGInnerLoop [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
620 msLLVM-Unit.Transforms/Vectorize/_/VectorizeTests::VPlanHCFGTest.testVPInstructionToVPRecipesInner
Note: Google Test filter = VPlanHCFGTest.testVPInstructionToVPRecipesInner [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
30 msLLVM.Transforms/LoopVectorize::exact.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/opt < /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/exact.ll -loop-vectorize -force-vector-width=4 -S | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/exact.ll
3,340 msLLVM.Transforms/LoopVectorize::explicit_outer_detection.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/opt < /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/explicit_outer_detection.ll -loop-vectorize -enable-vplan-native-path -debug-only=loop-vectorize -S 2>&1 | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/explicit_outer_detection.ll
4,200 msLLVM.Transforms/LoopVectorize::explicit_outer_uniform_diverg_branch.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/opt < /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/explicit_outer_uniform_diverg_branch.ll -loop-vectorize -enable-vplan-native-path -debug-only=loop-vectorize -S 2>&1 | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/test/Transforms/LoopVectorize/explicit_outer_uniform_diverg_branch.ll
View Full Test Results (65 Failed)

Event Timeline

fhahn created this revision.Feb 16 2020, 10:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2020, 10:00 AM
fhahn planned changes to this revision.Feb 16 2020, 2:31 PM

Still needs a way to get defs from other recipes.

fhahn updated this revision to Diff 245836.Feb 21 2020, 6:22 AM

Move parts of to NFC patches, fix almost all failures.

fhahn updated this revision to Diff 247519.Mar 1 2020, 1:22 PM

Use created VPInstructions directly for VPWidenInstruction operands, if available.

Fix most outstanding issues, down to failing tests.

There is still a bit of cleanup and bikeshedding required, but first it would be good to decide on the overall approach.

Ayal added a comment.Mar 11 2020, 3:48 AM

Breaking the recipes into VPInstructions is indeed on the roadmap for evolving VPlan to support transformations, cost modeling, and more.

The approach we’ve taken so far is to first migrate the def-use dependencies between recipes, which originally rely on the underlying immutable def-use dependencies among their IR ingredient Instructions and their Operands, to VPlan VPValue-VPUser dependencies. Once two recipes feed each other via a VPValue-VPUser pair, new VPInstructions can be introduced freely between them. The primary motivation for this approach is to facilitate VPlan def-use analysis and transformations, including reordering recipes, replacing them, and introducing new VPInstructions. Once the uses of all recipes migrate to VPUsers, a rather mechanical process following D70865, any single-def recipe can be converted independently into a VPInstruction. VPWidenRecipe is indeed an easy first candidate, which would break up naturally at this time.

An approach that first breaks recipes into VPInstructions, will migrate all def-use dependencies when the *last* recipe is broken. Other recipes are however more challenging to break than VPWidenRecipe, because they have multiple defs, entail SCEV expansions, un/pack predicated scalars along a to-be-unrolled loop of VF*UF iterations, and more. These design challenges require careful attention, which should preferably be devoted in parallel to facilitating VPlan transformation rather than blocking it.

Breaking VPWidenRecipe into a VPInstruction at this time, by letting dependent recipes continue to rely on their underlying IR Operands, via a lookup for potential VPValues (VPInstructions) that cover their corresponding IR Value def, presumably builds a VPlan def-use graph but in effect does not support introducing a new VPInstruction between a VPWidenInstruction and its users. It seems preferable to first focus on migrating away from the original IR def-use relations, gradually and consistently.

Gil&Ayal.

fhahn updated this revision to Diff 250439.Mar 15 2020, 1:31 PM

Rebased & updated to only use VPWidenInstruction in VPIRBuilder if there is an underlying value.

Breaking the recipes into VPInstructions is indeed on the roadmap for evolving VPlan to support transformations, cost modeling, and more.

The approach we’ve taken so far is to first migrate the def-use dependencies between recipes, which originally rely on the underlying immutable def-use dependencies among their IR ingredient Instructions and their Operands, to VPlan VPValue-VPUser dependencies. Once two recipes feed each other via a VPValue-VPUser pair, new VPInstructions can be introduced freely between them. The primary motivation for this approach is to facilitate VPlan def-use analysis and transformations, including reordering recipes, replacing them, and introducing new VPInstructions. Once the uses of all recipes migrate to VPUsers, a rather mechanical process following D70865, any single-def recipe can be converted independently into a VPInstruction. VPWidenRecipe is indeed an easy first candidate, which would break up naturally at this time.

An approach that first breaks recipes into VPInstructions, will migrate all def-use dependencies when the *last* recipe is broken. Other recipes are however more challenging to break than VPWidenRecipe, because they have multiple defs, entail SCEV expansions, un/pack predicated scalars along a to-be-unrolled loop of VF*UF iterations, and more. These design challenges require careful attention, which should preferably be devoted in parallel to facilitating VPlan transformation rather than blocking it.

Breaking VPWidenRecipe into a VPInstruction at this time, by letting dependent recipes continue to rely on their underlying IR Operands, via a lookup for potential VPValues (VPInstructions) that cover their corresponding IR Value def, presumably builds a VPlan def-use graph but in effect does not support introducing a new VPInstruction between a VPWidenInstruction and its users. It seems preferable to first focus on migrating away from the original IR def-use relations, gradually and consistently.

Thanks for the detailed response! I agree we should focus on migrating the def-use dependencies of the existing recipes. I think doing so for VPWidenRecipe might be more tricky than other recipes, as VPWidenRecipe currently contains a list of instructions. To me it seems breaking VPWidenRecipe up into a new VPInstructions makes modeling the def-use chain easier (the operands are explicit per VPInstruction) and allows us to make progress on 2 fronts: migrating def-uses and getting rid of a recipe. It also should not hinder migrating the def-use dependencies for other recipes, but rather help because we can directly use the VPWidenInstruction as VPValue inputs for other recipes.

I thought a little bit about how to migrate the def-use dependencies for VPWidenRecipe without breaking it up, but couldn't come up with a more straight-forward approach. I might be missing something though and would also be happy to explore other directions. Given that the existing approach in this patch seems to work well in practice (and it makes progress in 2 directions), I think it would allow us to continue moving in the right direction, while we should continue to focus on migrating the def-use dependencies for recipes that cannot be broken up easily.

gilr added a comment.Mar 17 2020, 8:13 AM

I think doing so for VPWidenRecipe might be more tricky than other recipes, as VPWidenRecipe currently contains a list of instructions.
[snip]
I thought a little bit about how to migrate the def-use dependencies for VPWidenRecipe without breaking it up, but couldn't come up with a more straight-forward approach.

VPWidenRecipe wrapped intervals of ingredients just to keep VPlan's memory footprint low. Now that we begin to model def-use relations this mechanism will have to go (it's already suppressed for sink-after). We can start by removing compression altogether in a preliminary patch, turning VPWidenRecipe into a single-ingredient recipe, ready to take VPValue operands.
A following patch can add VPValue operands to VPWidenRecipe by calling getOrAddVPValue() for each IR operand, similar to D70865.
The next patch (perhaps also in parts) can replace VPWidenRecipes with VPInstructions by
(a) binding VP defs to IR users, as proposed
(b) extending the usage of recordRecipe() to ingredients covered by VPInstructions such that calling getRecipe() will return either a VPInstruction, another Recipe or nullptr. For the latter two cases, call getOrAddVPValue().
(c) refactoring VPWidenRecipe's execute()
What do you think?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6983

Not sure why ExternalDef is needed, as Value2VPValue is used for both variant and invariant values.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
93

Better reuse getRecipe(), allowing it to return nullptr in order to trivially support querying invariant values and ingredients whose recipes are not being recorded.

95

Requires recording the recipes for such ingredients.

llvm/lib/Transforms/Vectorize/VPlan.h
732

Better avoid subclassing VPInstruction until scalar/type is modelled.

fhahn planned changes to this revision.Mar 18 2020, 11:01 AM

I think doing so for VPWidenRecipe might be more tricky than other recipes, as VPWidenRecipe currently contains a list of instructions.
[snip]
I thought a little bit about how to migrate the def-use dependencies for VPWidenRecipe without breaking it up, but couldn't come up with a more straight-forward approach.

VPWidenRecipe wrapped intervals of ingredients just to keep VPlan's memory footprint low. Now that we begin to model def-use relations this mechanism will have to go (it's already suppressed for sink-after). We can start by removing compression altogether in a preliminary patch, turning VPWidenRecipe into a single-ingredient recipe, ready to take VPValue operands.
A following patch can add VPValue operands to VPWidenRecipe by calling getOrAddVPValue() for each IR operand, similar to D70865.
The next patch (perhaps also in parts) can replace VPWidenRecipes with VPInstructions by
(a) binding VP defs to IR users, as proposed
(b) extending the usage of recordRecipe() to ingredients covered by VPInstructions such that calling getRecipe() will return either a VPInstruction, another Recipe or nullptr. For the latter two cases, call getOrAddVPValue().
(c) refactoring VPWidenRecipe's execute()
What do you think?

That sounds good. I'll break up the patch along the lines suggested ASAP.