If the incoming previous value of a first-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.
Details
- Reviewers
Ayal spatel gilr - Commits
- rGb8709a9d03f8: [LV] Support fixed order recurrences.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8512–8513 | This change looks like it might potentially affect reductions too, since now we're potentially not recording the recipe for reductions. Should there be a reduction test for this too, or an assert that Ingredient2Recipe[Inc] is always non-null for reductions? | |
8514 | nit: Should this just be !Ingredient2Recipe[Inc] as I thought LLVM convention wasn't to compare ==/!= with nullptr? | |
8997 | nit: Whitespace change. |
Address comments, make sure we record all first-order recurrence and reduction phi recipes, add assert .
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8512–8513 | I don't think the change should impact reductions, as the code here only skips recording Inc if it is not already marked for recording. I added an assertion that if we already have a recipe recorded at this point, it must be a reduction or first-order recurrence phi. | |
8514 | I think it would be best to use find() as not to insert new entries. Updated. | |
8997 | removed , thanks! |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8516 | Should we record recipes for induction phis? It's possible to be the incoming value for later phis. |
Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.
I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.
Hi, sorry to prod this again, is there any update on this, specifically which reviews/issues is this waiting on? We're very keen to push this forwards as it addresses some important issues for us, we are willing to help with development/patches in any areas that need it.
Unfortunately this seems unlikely, as I was out sick for most of last week and probably some of this week too.
Ping.
Rebased the patch and also updated it to remember any header phis. Corresponding tests have been added in ff5ae948a7230 & 6e1ba62d0dd. I have also added additional runtime testing to llvm-test-suite in rT1846f600f7db.
Note that the current handling is geared towards chains for first order recurrences. This means we widen each first-order recurrence separately and use the incoming value of the previous recurrence in the chain for the current one.
This should be different to modeling a whole chain as a 2nd or higher order recurrence, for which we may be able to generate a single recurrence phi for the whole chain. The approach in the current patch focuses on the case where each phi in the chain has other users, so we would need to generate recurrence phis for each phi in the chain anyways. Dedicated higher-order recurrence support can be added as follow-up, but I expect the gains in practice to be limited.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8516 | Yes, that's a good point, thanks! Should be fixed in the latest version and I added some extra test. |
Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.
Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8527 | Should the assert be retained - what happens with a header Phi that is not an induction, reduction or FOR? | |
8556 | I.e., is a VPHeaderPHIRecipe? Have a verifier assert that all header phi's are assigned VPHeaderPHIRecipes if desired instead of complicating the logic here? |
Thanks for taking a look!
I think renaming would require to also update the code in IVDescriptors which does the actual recurrence identification. Please let me know if you would like me to rename it accross LVLegality and IVDescriptors.
Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.
Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8527 | Yes it should be retained, I restored the original assert. | |
8556 | I updated the code to just check isa<VPHeaderPHIRecipe>. This also surfaced the fact that VPHeaderPHIRecipe::classof was missing checks for VPWidenPointerInduction. This is also fixed. I kept the assertion for now, as this allows us to verify the property of the Ingredient2Recipe mapping. |
Yeah, it seems First is no longer accurate for both LVLegality and IVDescriptors (but is for the recipes). Is there any benefit in caching the non-header-phi Previous in the IVDescriptor, given the iteration-by-iteration implementation?
Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.
Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9
Worth leaving a TODO somewhere?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8556 | Good catch re: classof. Wonder about this property we're verifying here - the IR is traversed top-down and is currently visiting the loop header phi's, so any Inc across the backedge which is already recorded must be a header phi (verify that?), which in turn should be mapped to a header phi recipe? | |
9000 | Note that FOR's do not form cyclic dependences (unlike inductions and reduction), hence this loop must terminate. |
Addressed latest comments, thanks!
Updated the naming. I've not added the filed for the non-header previous yet, because its uses currently are minimal.
Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.
Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9
Worth leaving a TODO somewhere?
I added a TODO at the point where we construct VPFirstOrderRecurrencePHIRecipe.
This LGTM, thanks!
Subject may also reflect that this patch extends First to Fixed.
Adding minor comments.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp | ||
---|---|---|
889 ↗ | (On Diff #453103) | Should this be checking if the non-phi previous dominates all users of the recurrence? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
8556 | Still unclear why this assert is important here. |
Thanks, I'll update the description in the committed version.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp | ||
---|---|---|
889 ↗ | (On Diff #453103) | I think the current code handles this, because the non-phi incoming value will be part of a fixed-order recurrence we are checking here. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
8556 | I just realized that it looks like I never submitted this comment. I think the assertion doesn't add much, so I removed it in the committed version. Below my original response, if it is worth adding the assertion that could be done as follow up :) The main goal is to provide extra restrictions here. When creating header phis, the only incoming values that should be possible are header phis. All other recipes can only be recorded/added later. It might be overcautious and I could remove it as well.
Moved the change with extra verification to D131989 | |
9000 | Thanks, I added a comment. |
Hello - We are seeing some pretty large regressions from this, I think because of the cost model and MVE not having a very good way at the moment of lowering the shuffles that get produced. MVE doesn't have a 'ext' / 'vector.splice' instruction, and the cost model seems to currently be modelled as a v1 extract. Do you have any objections to changing the cost model to use a SK_Splice?
Heads-up: we have found this revision to be the culprit for an LTO compilation crash.
Here's the stack trace:
F0000 00:00:1663945422.368954 9165 logging.cc:48] assert.h assertion failed at third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9135 in VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Instruction *, Instruction *> &): VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid" *** Check failure stack trace: *** @ 0x55e5fef6ba84 absl::log_internal::LogMessage::SendToLog() @ 0x55e5fef6b85d absl::log_internal::LogMessage::Flush() @ 0x55e5fef6bdc9 absl::log_internal::LogMessageFatal::~LogMessageFatal() @ 0x55e5fef5c244 __assert_fail @ 0x55e5fe3b66f1 llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes() @ 0x55e5fe3a92cc llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes() @ 0x55e5fe3a8838 llvm::LoopVectorizationPlanner::plan() @ 0x55e5fe3bfa8d llvm::LoopVectorizePass::processLoop() @ 0x55e5fe3c4a34 llvm::LoopVectorizePass::runImpl() @ 0x55e5fe3c542f llvm::LoopVectorizePass::run() @ 0x55e5fd37a472 llvm::detail::PassModel<>::run() @ 0x55e5fec617db llvm::PassManager<>::run() @ 0x55e5fa1d5312 llvm::detail::PassModel<>::run() @ 0x55e5fec6432d llvm::ModuleToFunctionPassAdaptor::run() @ 0x55e5fa1d75d2 llvm::detail::PassModel<>::run() @ 0x55e5fec60bbb llvm::PassManager<>::run() @ 0x55e5fa70498e llvm::lto::opt() @ 0x55e5fa7071c5 llvm::lto::thinBackend()::$_3::operator()() @ 0x55e5fa7070ad llvm::lto::thinBackend() @ 0x55e5fa1c3d06 clang::EmitBackendOutput() @ 0x55e5fa1bfb18 clang::CodeGenAction::ExecuteAction() @ 0x55e5fadcd323 clang::FrontendAction::Execute() @ 0x55e5fad42b5f clang::CompilerInstance::ExecuteAction() @ 0x55e5f9db0f5e clang::ExecuteCompilerInvocation() @ 0x55e5f9da4ca2 cc1_main() @ 0x55e5f9da2489 ExecuteCC1Tool() @ 0x55e5faeed1f7 llvm::function_ref<>::callback_fn<>() @ 0x55e5fedce4ff llvm::CrashRecoveryContext::RunSafely() @ 0x55e5faeeca22 clang::driver::CC1Command::Execute() @ 0x55e5faeae7a7 clang::driver::Compilation::ExecuteCommand() @ 0x55e5faeaead0 clang::driver::Compilation::ExecuteJobs() @ 0x55e5faecc02f clang::driver::Driver::ExecuteCompilation() @ 0x55e5f9da1b1b clang_main() @ 0x7f6b56d77633 __libc_start_main @ 0x55e5f9d9ebaa _start
We're working on a reduced test case.
@fhahn could you take a look?
$ cat /tmp/c.ll target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" define void @gen_interp(ptr %0) { br label %2 2: ; preds = %2, %1 %3 = phi double [ %9, %2 ], [ 0.000000e+00, %1 ] %4 = phi double [ %11, %2 ], [ 0.000000e+00, %1 ] %5 = phi double [ %4, %2 ], [ 0.000000e+00, %1 ] %6 = phi i64 [ %10, %2 ], [ 0, %1 ] %7 = fdiv double %5, %3 %8 = fdiv double 0.000000e+00, %3 %9 = load double, ptr null, align 8 %10 = add nuw nsw i64 %6, 1 %11 = load double, ptr null, align 8 store double %8, ptr %0, align 8 %12 = icmp eq i64 %10, 0 br i1 %12, label %13, label %2 13: ; preds = %2 ret void } $ ./build/rel/bin/opt -passes='loop-vectorize' -disable-output /tmp/c.ll Use before def! opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9136: VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Inst ruction *, Instruction *> &): Assertion `VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"' failed.
This change looks like it might potentially affect reductions too, since now we're potentially not recording the recipe for reductions. Should there be a reduction test for this too, or an assert that Ingredient2Recipe[Inc] is always non-null for reductions?