- User Since
- Jul 12 2015, 1:48 PM (235 w, 5 d)
Thu, Jan 16
This look good to me, with a last minor nit, thanks!
Wed, Jan 15
This looks good to me, thanks!
Sun, Jan 12
Agree that it's better to merge the new conditional assume tests with the existing unconditional assume tests in the same file.
Thu, Jan 9
The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:
This looks good to me, thanks!
Wed, Jan 8
This looks good to me, ship it. Added some final minor optional nits.
Tue, Jan 7
Sun, Jan 5
Thanks. Perhaps it's better for Legal to declare via getConditionalAssumes() what is being returning, and let LVP decide what to do with them(*), rather than to assume from the start via getAssumesToDrop() that they must all be dropped.
Thu, Jan 2
This looks good to me, adding a couple of minor last nits, thanks!
Tue, Dec 31
The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example: [snip]
However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.
This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.
Mon, Dec 30
Just pointing out for the record that this is extracted from D67905
Sat, Dec 28
Thu, Dec 26
This looks good to me, with a couple of final optional comments.
Thanks for making all the changes! More comments inline.
Looks good to me; important step forward teaching ILV to deal with VPValues, lifting dependencies on ingredient operands and other attributes, simplifying recipes.
Mon, Dec 23
Currently 'createVectorizedLoopSkeleton' keeps track of new generated blocks using local variables and reassigns them to corresponding class fields at the end. That makes a bit harder to understand code structure. This patch uses class fields directly and extra locals got removed.
Fri, Dec 20
Dec 19 2019
This looks good to me, with couple of final nits; thanks!
Dec 18 2019
No need for collectReductionInstructions() any more.
Dec 17 2019
Dec 8 2019
Add test(s) having multiple wrapping reduction instructions.
Dec 7 2019
Dec 6 2019
Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.
Dec 5 2019
Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.
Dec 3 2019
Dec 2 2019
Dec 1 2019
Simplified the test as suggested.
Nov 28 2019
Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.
Nov 27 2019
Cleaned-up the #ifndef/LLVM_DEBUG and renamed the lambda.
Nov 26 2019
This looks good to me, thanks, plus a couple of comments.
Nov 20 2019
Ideally we should be using loop size and expected perf gain from cost model in addition to loop hotness to make the decision but that would be much bigger change and can be a follow up step.
Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).
Nov 18 2019
Nov 15 2019
Nov 12 2019
Nov 9 2019
Nov 7 2019
As revert eaff300 shows, more care must be taken to avoid having one FOR (First Order Recurring) phi sink an instruction that another FOR phi needs to keep in place. Namely, any Previous instruction that feeds a FOR phi across the backarc must not be allowed to sink, because that may break its dominance on other instructions who use the FOR phi. The original check against SinkAfter.count(Previous)) // Cannot rely on dominance due to motion addresses this concern but only partially; it works if the FOR phi that needs to sink an instruction is handled before the FOR phi that needs to keep it in place. But the two phi's could be encountered in the other order.
Nov 2 2019
Thanks for coming back to look into this @fhahn !
Nov 1 2019
Oct 31 2019
Having special treatment for GEPs here does follow the unique way in which GEPs tolerate both scalar/invariant operands and/or vector operands.
Oct 29 2019
This looks good to me, plus some final minor comments, thanks!
Oct 22 2019
Oct 21 2019
Oct 13 2019
If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.
Oct 9 2019
This LGTM, with additional CHECK-NOT to the test, thanks!
Oct 8 2019
Oct 5 2019
The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch, being an optimization opportunity rather than preventing an assert -- it's stride predicate is added early enough to be caught by CM.computeMaxVF()'s bailout conditions and avoid assert, as scev4stride1() test in above X86/optsize.ll checks.
Oct 4 2019
Changing PredicatedScalarEvolution::getAsAddRec() to consider hasOptSize sounds a bit strange to me; after all the transformation that created PSCEV should have considered this; adding @sanjoy for more thoughts on such a proposed SCEV change.
Sep 28 2019
Sep 26 2019
At this point, PSE.getUnionPredicate().getPredicates() returns 0
Sep 23 2019
This LGTM, thanks for fixing!
Sep 22 2019
So starting from D65197 or so, a loop that is forced to vectorize in a function that is OptForSize, is expected to get vectorized even if it involves replicating the loop.
Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.
Sep 21 2019
Sep 20 2019
When we optimise for size, and need to emit runtime checks to disambiguate memory, we should nicely bail, don't vectorise, and not run in an assert like we currently do.
Sep 18 2019
Sep 15 2019
Sep 14 2019
Sep 12 2019
Sep 11 2019
Thanks, just a minor nit.
Sep 8 2019
This LGTM, just some minor nits.
Sep 3 2019
Sep 2 2019
This LGTM too; added a couple of minor comments about comments.
Aug 29 2019
Aug 28 2019
Aug 27 2019
The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout. There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).
Aug 26 2019
There's some obvious room for API cleanup in the patch, but before I iterate on that, I want to make sure I'm approaching the problem the "right way" and that my attempt at clarifying comments are actually correct. :)
Aug 25 2019
Aug 13 2019
LGTM, good catch!