Code generation now uses the start VPValue of induction recipes.
This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.
Fixes #59459.
Paths
| Differential D92132
[LV] Support widened induction variables in epilogue vectorization. ClosedPublic Authored by fhahn on Nov 25 2020, 1:23 PM.
Details Summary Code generation now uses the start VPValue of induction recipes. This makes it possible to adjust the start value of the epilogue Fixes #59459.
Diff Detail
Event Timelinefhahn retitled this revision from [LV] Support widened induction variables in epilogue vectorization (WIP). to [LV] Support widened induction variables in epilogue vectorization..Oct 7 2021, 1:53 PM Comment Actions Instead of changing the generic interface for skeleton creation, how about adding a field to hold the resume value inside EpilogueVectorizerEpilogueLoop? Then inside executePlan() we would do something like this to update the induction widening recipes: EpilogueVectorizerEpilogueLoop *EILV = dyn_cast<EpilogueVectorizerEpilogueLoop>(ILV); if (EILV) { Value *IndStart = EILV->getResumeValue(); assert(IndStart && "Expected valid resume value"); ... } Also don't we need to store some sort of a map to be able to handle loops with multiple widened induction vars (with potentially different resume values)? I'm thinking of a case like this: void foo(int * restrict A, int N) { int i, j, k; for (i = 0, j = 1, k = 2; i < N; i++, j++, k++) A[i] = j + k; }
Comment Actions Hi Florian, a[i] = sin(i)+k; } I am able to see the resume value for induction "i" updated properly now after the patch . --Snip-- %34 = phi i64 [ %11, %28 ], [ 0, %8 ] %35 = and i64 %6, 4294967294 %36 = trunc i64 %35 to i32 %37 = add i32 %36, 2 %38 = insertelement <2 x i64> poison, i64 %34, i32 0 %39 = shufflevector <2 x i64> %38, <2 x i64> poison, <2 x i32> zeroinitializer %40 = or <2 x i64> %39, <i64 0, i64 1> %41 = trunc i64 %34 to i32 %42 = insertelement <2 x i32> poison, i32 %41, i32 0 %43 = shufflevector <2 x i32> %42, <2 x i32> poison, <2 x i32> zeroinitializer %44 = or <2 x i32> %43, <i32 0, i32 1> br label %45 45: ; preds = %45, %33 %46 = phi i64 [ %34, %33 ], [ %55, %45 ] %47 = phi <2 x i64> [ %40, %33 ], [ %56, %45 ] <== updated %48 = phi <2 x i32> [ %44, %33 ], [ %57, %45 ] <== updated --Snip-- I was thinking if we could reuse the incremented widened induction results from the main vector loop. But this seems cleaner and simpler code . Comment Actions Hi Florian, This patch breaks one assert when building compiler-rt. However I noticed that in my work space I lowered the epilog generation thresholds to 8 so got this assert. Here is the general reproducer. ---Snip winden-ce.cpp --- void Add( unsigned int begin,unsigned int end, unsigned int cfi_check) { // Don't fill anything below cfi_check. We can not represent those addresses // in the shadow, and must make sure at codegen to place all valid call // targets above cfi_check. //begin = Max(begin, cfi_check); unsigned short int *s = MemToShadow(begin); unsigned short int *s_end = MemToShadow(end - 1) + 1; unsigned short int sv = ((begin - cfi_check)) + 1; for (; s < s_end; s++, sv++) *s = sv; } clang++ -O3 winden-ce.cpp -S -emit-llvm -mllvm -epilogue-vectorization-minimum-VF=8 LoopVectorize.cpp:2535: virtual llvm::Value* llvm::InnerLoopVectorizer::getStepVector(llvm::Value*, int, llvm::Value*, llvm::Instruction::BinaryOps): Assertion `Step->getType() == STy && "Step has wrong type"' failed. I noticed Main loop Vector splat created in the preheader is with i16 type (unsigned short *) whereas Epilog loop creates vector splat with i32 type. Comment Actions Thanks for taking a look and giving it a try! I now remember what was still missing, handling noprimiary IV cases. I'll work to update the patch, but I think it might need a few more patches to bring other parts of the infrastructure up to speedd. Comment Actions I reworked the patch to use a createInductionResumeValue to create the required resume values as needed. fhahn added a parent revision: D134211: [LV] Create createInductionResumeValue helper..Sep 19 2022, 12:43 PM
Comment Actions Hi I tried to build the patch on top of the latest trunk and facing this issue. /home/amd/venkat/aocc-work/mirror/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10505:24: error: ‘class llvm::VPWidenPointerInductionRecipe’ has no member named ‘getInductionDescriptor’ Any other patch I am missing to add other than this one and its parent ? Comment Actions Thanks for taking a look! Comments should be addressed, also moved getInductionDescriptor definition from separate patch to this one. fhahn added inline comments.
Comment Actions Good improvement, adding various nits.
fhahn marked 3 inline comments as done. Comment ActionsMove test with truncated IV to optimal-epilog-vectorization.ll fhahn added inline comments.
Comment Actions Thanks for accommodating, this looks good to me, with a couple of final nits.
Comment Actions Just one question otherwise patch looks good to me.
fhahn marked an inline comment as done. Comment ActionsRebase on top of additional tests in 26c8632f22a5f49f57410607476c874a45daf3a5 and also update comment in EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton.
Comment Actions Rebase & ping. This also fixes another issue: https://github.com/llvm/llvm-project/issues/59459 Comment Actions Ship it!
This revision is now accepted and ready to land.Dec 20 2022, 12:43 PM This revision was landed with ongoing or failed builds.Dec 21 2022, 5:59 AM Closed by commit rGf69ac9a22dca: [LV] Support widened induction variables in epilogue vectorization. (authored by fhahn). · Explain Why This revision was automatically updated to reflect the committed changes. fhahn added inline comments.
Revision Contents
Diff 466299 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/lib/Transforms/Vectorize/VPlan.h
llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll
llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll
|
unused?