- User Since
- Jul 12 2015, 1:48 PM (247 w, 2 d)
@SjoerdMeijer , test tail-folding-counting-down.ll introduced in D72324 now fails, as it can be vectorized with fold-tail, but is not vectorized due to cost. What's the intention of this test and how should it be changed?
minor comments, looks good to me, thanks.
Sun, Apr 5
Fri, Apr 3
Thu, Apr 2
Wed, Apr 1
Mon, Mar 30
Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.
Thu, Mar 26
It would indeed be good to have FoldTail handle loops that currently lack a Primary Induction Variable (0, +1). Such an %iv is needed in order to build the "%iv < %TripCount" comparison for masking.
In order to handle loops with reversed (%TripCount, -1) induction variables %riv instead of a PIV, this comparison should use %iv = %TripCount - %riv, or simply compare "%riv > 0" instead.
Note that LV eventually always builds a new PIV to control the vector loop, so best use it instead of an %riv; but this PIV is not (yet) represented in VPlan directly, i.e., in the absence of an original PIV.
There are however non-primary cases other than reversed whose tail would also be good to fold, e.g., when the start index is non-zero or when SCEV can somehow determine the trip count as in PR40816.
Wed, Mar 25
Tue, Mar 24
This looks good to me too, thanks, with minor nit.
Fri, Mar 20
Wed, Mar 18
Thanks for following up quickly on https://reviews.llvm.org/D76200#inline-694794
Thanks for following-up quickly on https://reviews.llvm.org/D76200#inline-696071
This looks good to me, thanks!
Added some minor final comments.
Tue, Mar 17
Sun, Mar 15
Sat, Mar 14
Eliminating a redundant back-edge is clearly good. It would be better if such a special-case cleanup could be handled by some subsequent optimization, and possibly generalized. Note however that the remainder loop may or may not be considered subject for further optimization: LV currently disables unrolling the remainder loop following r231631, OTOH it may be worth vectorizing according to D30247. If desired, optimizations other than vectorization should preferably be taken care of by subsequent passes such as indvars and loop-unroll. LV knows that the trip count of the remainder loop is at most VF*UF, in the absence of overflows and runtime guards, and can make an effort to convey this bound, if GVN and IPSCCP fail to pick it up (referring to PR44547). Unfortunately, introducing an llvm.assume() seems insufficient - perhaps it could be made to work?
Wed, Mar 11
Breaking the recipes into VPInstructions is indeed on the roadmap for evolving VPlan to support transformations, cost modeling, and more.
Jan 30 2020
Culprit is the following from LoopVectorize.cpp:
Jan 22 2020
Jan 16 2020
This look good to me, with a last minor nit, thanks!
Jan 15 2020
This looks good to me, thanks!
Jan 12 2020
Agree that it's better to merge the new conditional assume tests with the existing unconditional assume tests in the same file.
Jan 9 2020
The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:
This looks good to me, thanks!
Jan 8 2020
This looks good to me, ship it. Added some final minor optional nits.
Jan 7 2020
Jan 5 2020
Thanks. Perhaps it's better for Legal to declare via getConditionalAssumes() what is being returning, and let LVP decide what to do with them(*), rather than to assume from the start via getAssumesToDrop() that they must all be dropped.
Jan 2 2020
This looks good to me, adding a couple of minor last nits, thanks!
Dec 31 2019
The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example: [snip]
However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.
This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.
Dec 30 2019
Just pointing out for the record that this is extracted from D67905
Dec 28 2019
Dec 26 2019
This looks good to me, with a couple of final optional comments.
Thanks for making all the changes! More comments inline.
Looks good to me; important step forward teaching ILV to deal with VPValues, lifting dependencies on ingredient operands and other attributes, simplifying recipes.
Dec 23 2019
Currently 'createVectorizedLoopSkeleton' keeps track of new generated blocks using local variables and reassigns them to corresponding class fields at the end. That makes a bit harder to understand code structure. This patch uses class fields directly and extra locals got removed.
Dec 20 2019
Dec 19 2019
This looks good to me, with couple of final nits; thanks!
Dec 18 2019
No need for collectReductionInstructions() any more.
Dec 17 2019
Dec 8 2019
Add test(s) having multiple wrapping reduction instructions.
Dec 7 2019
Dec 6 2019
Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.
Dec 5 2019
Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.
Dec 3 2019
Dec 2 2019
Dec 1 2019
Simplified the test as suggested.
Nov 28 2019
Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.
Nov 27 2019
Cleaned-up the #ifndef/LLVM_DEBUG and renamed the lambda.
Nov 26 2019
This looks good to me, thanks, plus a couple of comments.
Nov 20 2019
Ideally we should be using loop size and expected perf gain from cost model in addition to loop hotness to make the decision but that would be much bigger change and can be a follow up step.
Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).
Nov 18 2019
Nov 15 2019
Nov 12 2019
Nov 9 2019
Nov 7 2019
As revert eaff300 shows, more care must be taken to avoid having one FOR (First Order Recurring) phi sink an instruction that another FOR phi needs to keep in place. Namely, any Previous instruction that feeds a FOR phi across the backarc must not be allowed to sink, because that may break its dominance on other instructions who use the FOR phi. The original check against SinkAfter.count(Previous)) // Cannot rely on dominance due to motion addresses this concern but only partially; it works if the FOR phi that needs to sink an instruction is handled before the FOR phi that needs to keep it in place. But the two phi's could be encountered in the other order.
Nov 2 2019
Thanks for coming back to look into this @fhahn !
Nov 1 2019
Oct 31 2019
Having special treatment for GEPs here does follow the unique way in which GEPs tolerate both scalar/invariant operands and/or vector operands.
Oct 29 2019
This looks good to me, plus some final minor comments, thanks!
Oct 22 2019
Oct 21 2019
Oct 13 2019
If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.
Oct 9 2019
This LGTM, with additional CHECK-NOT to the test, thanks!
Oct 8 2019
Oct 5 2019
The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch, being an optimization opportunity rather than preventing an assert -- it's stride predicate is added early enough to be caught by CM.computeMaxVF()'s bailout conditions and avoid assert, as scev4stride1() test in above X86/optsize.ll checks.