This is an archive of the discontinued LLVM Phabricator instance.

[LV] Support chained phis as incoming values for first-order recurs.
ClosedPublic

Authored by fhahn on Feb 13 2022, 3:41 AM.

Details

Summary

If the incoming previous value of a first-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

Diff Detail

Event Timeline

fhahn created this revision.Feb 13 2022, 3:41 AM
fhahn requested review of this revision.Feb 13 2022, 3:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2022, 3:41 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8886–8897

This change looks like it might potentially affect reductions too, since now we're potentially not recording the recipe for reductions. Should there be a reduction test for this too, or an assert that Ingredient2Recipe[Inc] is always non-null for reductions?

8888

nit: Should this just be !Ingredient2Recipe[Inc] as I thought LLVM convention wasn't to compare ==/!= with nullptr?

9268

nit: Whitespace change.

fhahn updated this revision to Diff 409651.Feb 17 2022, 7:39 AM

Address comments, make sure we record all first-order recurrence and reduction phi recipes, add assert .

fhahn marked 3 inline comments as done.Feb 17 2022, 7:43 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8886–8897

I don't think the change should impact reductions, as the code here only skips recording Inc if it is not already marked for recording. I added an assertion that if we already have a recipe recorded at this point, it must be a reduction or first-order recurrence phi.

8888

I think it would be best to use find() as not to insert new entries. Updated.

9268

removed , thanks!

fhahn planned changes to this revision.Feb 22 2022, 2:14 AM
fhahn marked 3 inline comments as done.

Need to make some changes before the next round of reviews.

bsmith added a subscriber: bsmith.Feb 28 2022, 3:00 AM
Allen added a subscriber: Allen.Mar 6 2022, 7:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 7:35 PM
TKaipeng added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8869

Should we record recipes for induction phis? It's possible to be the incoming value for later phis.

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

Hi, sorry to prod this again, is there any update on this, specifically which reviews/issues is this waiting on? We're very keen to push this forwards as it addresses some important issues for us, we are willing to help with development/patches in any areas that need it.

fhahn added a comment.Jul 5 2022, 10:29 PM

I'm aka

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

Hi, sorry to prod this again, is there any update on this, specifically which reviews/issues is this waiting on? We're very keen to push this forwards as it addresses some important issues for us, we are willing to help with development/patches in any areas that need it.

I should be able to wrap up testing for an update by the end of this week.

Hi Florian, any chance of this one landing in time for LLVM 15?

fhahn added a comment.Jul 25 2022, 2:43 AM

Hi Florian, any chance of this one landing in time for LLVM 15?

Unfortunately this seems unlikely, as I was out sick for most of last week and probably some of this week too.

fhahn updated this revision to Diff 448982.Aug 1 2022, 3:52 AM

Ping.

Rebased the patch and also updated it to remember any header phis. Corresponding tests have been added in ff5ae948a7230 & 6e1ba62d0dd. I have also added additional runtime testing to llvm-test-suite in rT1846f600f7db.

Note that the current handling is geared towards chains for first order recurrences. This means we widen each first-order recurrence separately and use the incoming value of the previous recurrence in the chain for the current one.

This should be different to modeling a whole chain as a 2nd or higher order recurrence, for which we may be able to generate a single recurrence phi for the whole chain. The approach in the current patch focuses on the case where each phi in the chain has other users, so we would need to generate recurrence phis for each phi in the chain anyways. Dedicated higher-order recurrence support can be added as follow-up, but I expect the gains in practice to be limited.

fhahn marked an inline comment as done.Aug 1 2022, 3:53 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8869

Yes, that's a good point, thanks! Should be fixed in the latest version and I added some extra test.

fhahn marked an inline comment as done.Aug 11 2022, 3:19 AM

ping :)

Ayal added a comment.Aug 15 2022, 4:25 AM

Ping.

Rebased the patch and also updated it to remember any header phis. Corresponding tests have been added in ff5ae948a7230 & 6e1ba62d0dd. I have also added additional runtime testing to llvm-test-suite in rT1846f600f7db.

Note that the current handling is geared towards chains for first order recurrences. This means we widen each first-order recurrence separately and use the incoming value of the previous recurrence in the chain for the current one.

This should be different to modeling a whole chain as a 2nd or higher order recurrence, for which we may be able to generate a single recurrence phi for the whole chain. The approach in the current patch focuses on the case where each phi in the chain has other users, so we would need to generate recurrence phis for each phi in the chain anyways. Dedicated higher-order recurrence support can be added as follow-up, but I expect the gains in practice to be limited.

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8908

Should the assert be retained - what happens with a header Phi that is not an induction, reduction or FOR?

8924

I.e., is a VPHeaderPHIRecipe? Have a verifier assert that all header phi's are assigned VPHeaderPHIRecipes if desired instead of complicating the logic here?

fhahn updated this revision to Diff 452656.Aug 15 2022, 6:57 AM
fhahn marked an inline comment as done.

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Thanks for taking a look!

I think renaming would require to also update the code in IVDescriptors which does the actual recurrence identification. Please let me know if you would like me to rename it accross LVLegality and IVDescriptors.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

fhahn marked an inline comment as done.Aug 15 2022, 7:00 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8908

Yes it should be retained, I restored the original assert.

8924

I updated the code to just check isa<VPHeaderPHIRecipe>. This also surfaced the fact that VPHeaderPHIRecipe::classof was missing checks for VPWidenPointerInduction. This is also fixed.

I kept the assertion for now, as this allows us to verify the property of the Ingredient2Recipe mapping.

Ayal added a comment.Aug 16 2022, 11:41 AM

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Thanks for taking a look!

I think renaming would require to also update the code in IVDescriptors which does the actual recurrence identification. Please let me know if you would like me to rename it accross LVLegality and IVDescriptors.

Yeah, it seems First is no longer accurate for both LVLegality and IVDescriptors (but is for the recipes). Is there any benefit in caching the non-header-phi Previous in the IVDescriptor, given the iteration-by-iteration implementation?

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

Worth leaving a TODO somewhere?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8924

Good catch re: classof.

Wonder about this property we're verifying here - the IR is traversed top-down and is currently visiting the loop header phi's, so any Inc across the backedge which is already recorded must be a header phi (verify that?), which in turn should be mapped to a header phi recipe?

9277

Note that FOR's do not form cyclic dependences (unlike inductions and reduction), hence this loop must terminate.

fhahn updated this revision to Diff 453103.Aug 16 2022, 1:03 PM
fhahn marked 3 inline comments as done.

Addressed latest comments, thanks!

Nice reuse of chained FOR's!

Yeah, it seems First is no longer accurate for both LVLegality and IVDescriptors (but is for the recipes). Is there any benefit in caching the non-header-phi Previous in the IVDescriptor, given the iteration-by-iteration implementation?

Updated the naming. I've not added the filed for the non-header previous yet, because its uses currently are minimal.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

Worth leaving a TODO somewhere?

I added a TODO at the point where we construct VPFirstOrderRecurrencePHIRecipe.

Ayal accepted this revision.Aug 17 2022, 1:49 PM

This LGTM, thanks!
Subject may also reflect that this patch extends First to Fixed.
Adding minor comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
889 ↗(On Diff #453103)

Should this be checking if the non-phi previous dominates all users of the recurrence?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8924

Still unclear why this assert is important here.

This revision is now accepted and ready to land.Aug 17 2022, 1:49 PM
fhahn marked an inline comment as done.Aug 18 2022, 11:14 AM

This LGTM, thanks!
Subject may also reflect that this patch extends First to Fixed.
Adding minor comments.

Thanks, I'll update the description in the committed version.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
889 ↗(On Diff #453103)

I think the current code handles this, because the non-phi incoming value will be part of a fixed-order recurrence we are checking here.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8924

I just realized that it looks like I never submitted this comment. I think the assertion doesn't add much, so I removed it in the committed version. Below my original response, if it is worth adding the assertion that could be done as follow up :)

The main goal is to provide extra restrictions here. When creating header phis, the only incoming values that should be possible are header phis. All other recipes can only be recorded/added later. It might be overcautious and I could remove it as well.

Good catch re: classof.

Moved the change with extra verification to D131989

9277

Thanks, I added a comment.

This revision was landed with ongoing or failed builds.Aug 18 2022, 11:16 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.

Hello - We are seeing some pretty large regressions from this, I think because of the cost model and MVE not having a very good way at the moment of lowering the shuffles that get produced. MVE doesn't have a 'ext' / 'vector.splice' instruction, and the cost model seems to currently be modelled as a v1 extract. Do you have any objections to changing the cost model to use a SK_Splice?

I've put up an adjustment in D132308.

bgraur added a subscriber: bgraur.Sep 23 2022, 8:12 AM

Heads-up: we have found this revision to be the culprit for an LTO compilation crash.

Here's the stack trace:

F0000 00:00:1663945422.368954    9165 logging.cc:48] assert.h assertion failed at third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9135 in VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Instruction *, Instruction *> &): VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"
*** Check failure stack trace: ***
    @     0x55e5fef6ba84  absl::log_internal::LogMessage::SendToLog()
    @     0x55e5fef6b85d  absl::log_internal::LogMessage::Flush()
    @     0x55e5fef6bdc9  absl::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55e5fef5c244  __assert_fail
    @     0x55e5fe3b66f1  llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes()
    @     0x55e5fe3a92cc  llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes()
    @     0x55e5fe3a8838  llvm::LoopVectorizationPlanner::plan()
    @     0x55e5fe3bfa8d  llvm::LoopVectorizePass::processLoop()
    @     0x55e5fe3c4a34  llvm::LoopVectorizePass::runImpl()
    @     0x55e5fe3c542f  llvm::LoopVectorizePass::run()
    @     0x55e5fd37a472  llvm::detail::PassModel<>::run()
    @     0x55e5fec617db  llvm::PassManager<>::run()
    @     0x55e5fa1d5312  llvm::detail::PassModel<>::run()
    @     0x55e5fec6432d  llvm::ModuleToFunctionPassAdaptor::run()
    @     0x55e5fa1d75d2  llvm::detail::PassModel<>::run()
    @     0x55e5fec60bbb  llvm::PassManager<>::run()
    @     0x55e5fa70498e  llvm::lto::opt()
    @     0x55e5fa7071c5  llvm::lto::thinBackend()::$_3::operator()()
    @     0x55e5fa7070ad  llvm::lto::thinBackend()
    @     0x55e5fa1c3d06  clang::EmitBackendOutput()
    @     0x55e5fa1bfb18  clang::CodeGenAction::ExecuteAction()
    @     0x55e5fadcd323  clang::FrontendAction::Execute()
    @     0x55e5fad42b5f  clang::CompilerInstance::ExecuteAction()
    @     0x55e5f9db0f5e  clang::ExecuteCompilerInvocation()
    @     0x55e5f9da4ca2  cc1_main()
    @     0x55e5f9da2489  ExecuteCC1Tool()
    @     0x55e5faeed1f7  llvm::function_ref<>::callback_fn<>()
    @     0x55e5fedce4ff  llvm::CrashRecoveryContext::RunSafely()
    @     0x55e5faeeca22  clang::driver::CC1Command::Execute()
    @     0x55e5faeae7a7  clang::driver::Compilation::ExecuteCommand()
    @     0x55e5faeaead0  clang::driver::Compilation::ExecuteJobs()
    @     0x55e5faecc02f  clang::driver::Driver::ExecuteCompilation()
    @     0x55e5f9da1b1b  clang_main()
    @     0x7f6b56d77633  __libc_start_main
    @     0x55e5f9d9ebaa  _start

We're working on a reduced test case.

@fhahn could you take a look?

$ cat /tmp/c.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define void @gen_interp(ptr %0) {
  br label %2

2:                                                ; preds = %2, %1
  %3 = phi double [ %9, %2 ], [ 0.000000e+00, %1 ]
  %4 = phi double [ %11, %2 ], [ 0.000000e+00, %1 ]
  %5 = phi double [ %4, %2 ], [ 0.000000e+00, %1 ]
  %6 = phi i64 [ %10, %2 ], [ 0, %1 ]
  %7 = fdiv double %5, %3
  %8 = fdiv double 0.000000e+00, %3
  %9 = load double, ptr null, align 8
  %10 = add nuw nsw i64 %6, 1
  %11 = load double, ptr null, align 8
  store double %8, ptr %0, align 8
  %12 = icmp eq i64 %10, 0
  br i1 %12, label %13, label %2

13:                                               ; preds = %2
  ret void
}
$ ./build/rel/bin/opt -passes='loop-vectorize' -disable-output /tmp/c.ll
Use before def!                                                                                                                                                                                    
opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9136: VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Inst
ruction *, Instruction *> &): Assertion `VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"' failed.                                                                                    
$ cat /tmp/c.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define void @gen_interp(ptr %0) {
  br label %2

2:                                                ; preds = %2, %1
  %3 = phi double [ %9, %2 ], [ 0.000000e+00, %1 ]
  %4 = phi double [ %11, %2 ], [ 0.000000e+00, %1 ]
  %5 = phi double [ %4, %2 ], [ 0.000000e+00, %1 ]
  %6 = phi i64 [ %10, %2 ], [ 0, %1 ]
  %7 = fdiv double %5, %3
  %8 = fdiv double 0.000000e+00, %3
  %9 = load double, ptr null, align 8
  %10 = add nuw nsw i64 %6, 1
  %11 = load double, ptr null, align 8
  store double %8, ptr %0, align 8
  %12 = icmp eq i64 %10, 0
  br i1 %12, label %13, label %2

13:                                               ; preds = %2
  ret void
}
$ ./build/rel/bin/opt -passes='loop-vectorize' -disable-output /tmp/c.ll
Use before def!                                                                                                                                                                                    
opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9136: VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Inst
ruction *, Instruction *> &): Assertion `VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"' failed.                                                                                    

Thanks, this should be fixed by D134083