Page MenuHomePhabricator

Ayal (Ayal Zaks)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 12 2015, 1:48 PM (235 w, 5 d)

Recent Activity

Thu, Jan 16

Ayal added inline comments to D67905: [LV] Vectorizer should adjust trip count in profile information.
Thu, Jan 16, 11:58 PM · Restricted Project
Ayal accepted D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..

This look good to me, with a last minor nit, thanks!

Thu, Jan 16, 11:49 PM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Thu, Jan 16, 8:38 AM · Restricted Project

Wed, Jan 15

Ayal accepted D67905: [LV] Vectorizer should adjust trip count in profile information.

This looks good to me, thanks!

Wed, Jan 15, 4:03 PM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Wed, Jan 15, 3:05 PM · Restricted Project

Sun, Jan 12

Ayal accepted D68814: [LV] Allow assume calls in predicated blocks..

Agree that it's better to merge the new conditional assume tests with the existing unconditional assume tests in the same file.

Sun, Jan 12, 3:41 AM · Restricted Project

Thu, Jan 9

Ayal added a comment to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:

Thu, Jan 9, 8:40 AM · Restricted Project
Ayal accepted D72387: [LoopVectorize][TTI] Add an isLegalMaskedLoadStore method. NFC.

Thanks!

Thu, Jan 9, 4:45 AM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Thu, Jan 9, 12:53 AM · Restricted Project
Ayal accepted D72029: [LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount..

This looks good to me, thanks!

Thu, Jan 9, 12:43 AM · Restricted Project

Wed, Jan 8

Ayal accepted D70865: [LV] VPValues for memory operation pointers (NFCI).

This looks good to me, ship it. Added some final minor optional nits.

Wed, Jan 8, 11:57 PM · Restricted Project

Tue, Jan 7

Ayal added inline comments to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..
Tue, Jan 7, 2:35 PM · Restricted Project

Sun, Jan 5

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

Thanks. Perhaps it's better for Legal to declare via getConditionalAssumes() what is being returning, and let LVP decide what to do with them(*), rather than to assume from the start via getAssumesToDrop() that they must all be dropped.

Sun, Jan 5, 12:30 PM · Restricted Project

Thu, Jan 2

Ayal accepted D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

This looks good to me, adding a couple of minor last nits, thanks!

Thu, Jan 2, 4:24 AM · Restricted Project

Tue, Dec 31

Ayal added inline comments to D72029: [LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount..
Tue, Dec 31, 5:38 AM · Restricted Project
Ayal added a comment to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example: [snip]

However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.

This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.

Tue, Dec 31, 3:46 AM · Restricted Project

Mon, Dec 30

Ayal added a comment to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..

Just pointing out for the record that this is extracted from D67905

Mon, Dec 30, 4:31 AM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Mon, Dec 30, 4:27 AM · Restricted Project

Sat, Dec 28

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

[snip]

I vote to simply drop conditional assumes as a first step, at this time. [snip] Following D68577 such assumes could be dropped as a VPlan-to-VPlan transformation.

Sat, Dec 28, 9:06 AM · Restricted Project

Thu, Dec 26

Ayal accepted D71055: [LV][NFC] Some refactoring and renaming to facilitate next change..

This looks good to me, with a couple of final optional comments.

Thu, Dec 26, 10:42 AM · Restricted Project
Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Thanks for making all the changes! More comments inline.

Thu, Dec 26, 8:50 AM · Restricted Project
Ayal added a comment to D70865: [LV] VPValues for memory operation pointers (NFCI).

Looks good to me; important step forward teaching ILV to deal with VPValues, lifting dependencies on ingredient operands and other attributes, simplifying recipes.

Thu, Dec 26, 4:00 AM · Restricted Project

Mon, Dec 23

Ayal added a comment to D71055: [LV][NFC] Some refactoring and renaming to facilitate next change..

Currently 'createVectorizedLoopSkeleton' keeps track of new generated blocks using local variables and reassigns them to corresponding class fields at the end. That makes a bit harder to understand code structure. This patch uses class fields directly and extra locals got removed.

Mon, Dec 23, 1:31 PM · Restricted Project

Fri, Dec 20

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

@Ayal : Thanks! Could you commit it for me, as I have no commit access yet?

Fri, Dec 20, 5:19 AM · Restricted Project
Ayal committed rGe498be573871: [LV] Strip wrap flags from vectorized reductions (authored by Ayal).
[LV] Strip wrap flags from vectorized reductions
Fri, Dec 20, 5:11 AM
Ayal closed D69563: [LV] Strip wrap flags from vectorized reductions.
Fri, Dec 20, 5:10 AM · Restricted Project

Dec 19 2019

Ayal accepted D69563: [LV] Strip wrap flags from vectorized reductions.

This looks good to me, with couple of final nits; thanks!

Dec 19 2019, 4:40 AM · Restricted Project

Dec 18 2019

Ayal added inline comments to D69563: [LV] Strip wrap flags from vectorized reductions.
Dec 18 2019, 2:38 PM · Restricted Project
Ayal added inline comments to D67905: [LV] Vectorizer should adjust trip count in profile information.
Dec 18 2019, 2:00 PM · Restricted Project
Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

No need for collectReductionInstructions() any more.

Dec 18 2019, 12:24 PM · Restricted Project

Dec 17 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Probably not too much important because should be handled by the vector predicated instructions/intrinsics, but still

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

But this should apply to all such operations that participate in the vectorized part of the loop. Note that
(1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and

Is there some existing API to find them all? Or I need to invite my own?

AFAIK such an API does not currently exist.

Would not it be easier just to not copy wrap flags in widenInstruction() for all instructions [which I was shy to do initially :) ] or it is too aggressive?

Loosing all wrap flags would be too aggressive.

Why is it ok not to drop nuw here:

define i8 @function0(i8 %a) {
entry:
  br label %for.body

for.body:
  %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
  %cmp5 = icmp ult i8 %a, 127
  br i1 %cmp5, label %if.then, label %if.end

if.then:
  %mul = mul nuw i8 %a, 2
  br label %if.end

if.end:
  %k = phi i8 [ %mul, %if.then ], [ %a, %for.body ]
  %indvars.iv.next = add i32 %indvars.iv, 1
  %cmp = icmp slt i32 %indvars.iv.next, 42
  br i1 %cmp, label %for.body, label %for.end

for.end:
  ret i8 undef
}

Vector code generated is

vector.ph:                                        ; preds = %entry
  %broadcast.splatinsert1 = insertelement <4 x i8> undef, i8 %a, i32 0
  %broadcast.splat2 = shufflevector <4 x i8> %broadcast.splatinsert1, <4 x i8> undef, <4 x i32> zeroinitializer
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
  %0 = add i32 %index, 0
  %1 = icmp ult <4 x i8> %broadcast.splat2, <i8 -128, i8 -128, i8 -128, i8 -128>
  %2 = mul nuw <4 x i8> %broadcast.splat2, <i8 2, i8 2, i8 2, i8 2>                                  ; if %a == 200, this is poison...
  %3 = xor <4 x i1> %1, <i1 true, i1 true, i1 true, i1 true>
  %predphi = select <4 x i1> %3, <4 x i8> %broadcast.splat2, <4 x i8> %2                     ; ... even though the %predphi == %a broadcasted, it's still poison as it depends on %2 (according to https://llvm.org/docs/LangRef.html#poisonvalues)
  %index.next = add i32 %index, 4
  %4 = icmp eq i32 %index.next, 40
  br i1 %4, label %middle.block, label %vector.body, !llvm.loop !0

Do I miss anything important here that allows us not to drop "nuw" flags?

Dec 17 2019, 11:58 PM · Restricted Project
Ayal added inline comments to D69563: [LV] Strip wrap flags from vectorized reductions.
Dec 17 2019, 12:28 AM · Restricted Project

Dec 8 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Add test(s) having multiple wrapping reduction instructions.

Dec 8 2019, 11:48 AM · Restricted Project

Dec 7 2019

Ayal added a comment to D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..

Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.

I tried, but the test case requires tail folding and it seems to depend on an actual target. We make the decision to tail-fold in computeMaxVF (https://github.com/llvm/llvm-project/blob/c49194969430f0ee817498a7000a979a7a0ded03/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L4941) and use the max target VF, not the forced vector width. This seems a bit odd and we might want to change that, to make it easier to test tail-folding without any specific target. What do you think?

Dec 7 2019, 1:13 PM · Restricted Project

Dec 6 2019

Ayal added inline comments to D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..
Dec 6 2019, 1:51 AM · Restricted Project
Ayal accepted D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..

Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.

Dec 6 2019, 1:51 AM · Restricted Project

Dec 5 2019

Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Dec 5 2019, 5:00 AM · Restricted Project
Ayal accepted D69067: [LV] Record GEP widening decisions in recipe (NFCI).
Dec 5 2019, 1:43 AM · Restricted Project

Dec 3 2019

Ayal committed rG6ed9cef25f91: [LV] Scalar with predication must not be uniform (authored by Ayal).
[LV] Scalar with predication must not be uniform
Dec 3 2019, 9:56 AM
Ayal closed D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.
Dec 3 2019, 9:56 AM · Restricted Project

Dec 2 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

It looks like this also fixes https://bugs.llvm.org/show_bug.cgi?id=43951. The reproducer there might be a good source for a test case as well.

Dec 2 2019, 3:02 PM · Restricted Project

Dec 1 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

In the test case we predicate due to low trip count right? It might be worth calling that out.

Dec 1 2019, 3:08 PM · Restricted Project
Ayal updated the diff for D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Simplified the test as suggested.

Dec 1 2019, 2:56 PM · Restricted Project
Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

But this should apply to all such operations that participate in the vectorized part of the loop. Note that
(1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and

Is there some existing API to find them all? Or I need to invite my own?

Dec 1 2019, 8:23 AM · Restricted Project

Nov 28 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

Nov 28 2019, 8:05 AM · Restricted Project

Nov 27 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

and rebased.

Nov 27 2019, 12:03 PM · Restricted Project
Ayal updated the diff for D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Cleaned-up the #ifndef/LLVM_DEBUG and renamed the lambda.

Nov 27 2019, 12:03 PM · Restricted Project

Nov 26 2019

Ayal added a comment to D69067: [LV] Record GEP widening decisions in recipe (NFCI).

This looks good to me, thanks, plus a couple of comments.

Nov 26 2019, 8:16 AM · Restricted Project

Nov 20 2019

Ayal added a comment to D67805: [LV] Allow vectorization of hot short trip count loops with epilog.

Ideally we should be using loop size and expected perf gain from cost model in addition to loop hotness to make the decision but that would be much bigger change and can be a follow up step.

Nov 20 2019, 2:27 PM · Restricted Project
Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).

Nov 20 2019, 3:50 AM · Restricted Project

Nov 18 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Nice! This indeed seems to solve the problem we saw in PR40816.

I don't know the code but I can at least provide some additional testing by including this patch in my weekend testing to see nothing unexpected pops up there.

Testing went well!

Nov 18 2019, 1:43 AM · Restricted Project

Nov 15 2019

Ayal created D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.
Nov 15 2019, 2:24 AM · Restricted Project

Nov 12 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
  1. Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.

indvars does this transform. That said, it only runs once; in some cases, we manage to simplify the loop after indvars runs. Maybe it makes sense to run indvars again? We'd want to measure how much it actually triggers in practice.

Nov 12 2019, 12:19 PM · Restricted Project

Nov 9 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).

Testcase follows; reproduce with opt -loop-vectorize:

[snip]

Nov 9 2019, 2:22 PM · Restricted Project

Nov 7 2019

Ayal updated subscribers of D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..

As revert eaff300 shows, more care must be taken to avoid having one FOR (First Order Recurring) phi sink an instruction that another FOR phi needs to keep in place. Namely, any Previous instruction that feeds a FOR phi across the backarc must not be allowed to sink, because that may break its dominance on other instructions who use the FOR phi. The original check against SinkAfter.count(Previous)) // Cannot rely on dominance due to motion addresses this concern but only partially; it works if the FOR phi that needs to sink an instruction is handled before the FOR phi that needs to keep it in place. But the two phi's could be encountered in the other order.

Nov 7 2019, 3:04 PM · Restricted Project

Nov 2 2019

Ayal added a comment to D68831: [LV] Mark instructions with loop invariant arguments as uniform. (WIP).

Thanks for coming back to look into this @fhahn !

Nov 2 2019, 4:35 PM · Restricted Project
Ayal added inline comments to D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..
Nov 2 2019, 3:04 PM · Restricted Project

Nov 1 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).

Any other concern, guys?

Nov 1 2019, 2:04 PM · Restricted Project

Oct 31 2019

Ayal added a comment to D69067: [LV] Record GEP widening decisions in recipe (NFCI).

Having special treatment for GEPs here does follow the unique way in which GEPs tolerate both scalar/invariant operands and/or vector operands.

Oct 31 2019, 8:00 AM · Restricted Project

Oct 29 2019

Ayal accepted D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..

This looks good to me, plus some final minor comments, thanks!

Oct 29 2019, 12:19 PM · Restricted Project

Oct 22 2019

Ayal added inline comments to D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..
Oct 22 2019, 3:17 AM · Restricted Project

Oct 21 2019

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.

To retain conditional assumes along with their control flow, they could be marked under isScalarWithPredication; but this complicates vectorization, plus what use are such assumes when all else is if-converted(?)

Conditional assumes under uniform control flow could be retained, along with the uniform control flow they depend upon; this may be mostly relevant for outerloop vectorization.

I thought it might be desirable to only drop the calls at the point where we know for certain that we cannot preserve them: when we emit them in a block that gets merge in its predecessor. By doing it in the recipe, it should also be applicable to the outer loop vectorization. Alternatively we can drop them all as a first step, if that is preferred. Please let me know :)

Oct 21 2019, 1:54 PM · Restricted Project

Oct 13 2019

Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 13 2019, 2:17 PM · Restricted Project
Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 13 2019, 4:26 AM · Restricted Project
Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.

Oct 13 2019, 3:32 AM · Restricted Project

Oct 9 2019

Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 9 2019, 9:21 AM · Restricted Project
Ayal accepted D68082: [LV] Emitting SCEV checks with OptForSize.

This LGTM, with additional CHECK-NOT to the test, thanks!

Oct 9 2019, 5:49 AM · Restricted Project

Oct 8 2019

Ayal added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

Comments addressed:

  • cleaned up the test case a bit.
Oct 8 2019, 2:33 PM · Restricted Project

Oct 5 2019

Ayal added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch, being an optimization opportunity rather than preventing an assert -- it's stride predicate is added early enough to be caught by CM.computeMaxVF()'s bailout conditions and avoid assert, as scev4stride1() test in above X86/optsize.ll checks.

Oct 5 2019, 11:10 AM · Restricted Project

Oct 4 2019

Ayal updated subscribers of D68082: [LV] Emitting SCEV checks with OptForSize.

Changing PredicatedScalarEvolution::getAsAddRec() to consider hasOptSize sounds a bit strange to me; after all the transformation that created PSCEV should have considered this; adding @sanjoy for more thoughts on such a proposed SCEV change.

Oct 4 2019, 8:05 AM · Restricted Project

Sep 28 2019

Ayal added inline comments to D68082: [LV] Emitting SCEV checks with OptForSize.
Sep 28 2019, 8:03 AM · Restricted Project

Sep 26 2019

Ayal added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

At this point, PSE.getUnionPredicate().getPredicates() returns 0

Sep 26 2019, 2:45 PM · Restricted Project

Sep 23 2019

Ayal accepted D67764: [LV] Forced vectorization with runtime checks and OptForSize.

This LGTM, thanks for fixing!

Sep 23 2019, 5:36 PM · Restricted Project
Ayal added inline comments to D67764: [LV] Forced vectorization with runtime checks and OptForSize.
Sep 23 2019, 11:54 AM · Restricted Project
Ayal added inline comments to D67764: [LV] Forced vectorization with runtime checks and OptForSize.
Sep 23 2019, 7:37 AM · Restricted Project

Sep 22 2019

Ayal added a comment to D67764: [LV] Forced vectorization with runtime checks and OptForSize.

So starting from D65197 or so, a loop that is forced to vectorize in a function that is OptForSize, is expected to get vectorized even if it involves replicating the loop.
Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.

Sep 22 2019, 9:19 PM · Restricted Project

Sep 21 2019

Ayal added a comment to D67764: [LV] Forced vectorization with runtime checks and OptForSize.

Better to make the assert valid by checking !forcedToVectorize in addition to !OptForSize.

Sep 21 2019, 5:43 PM · Restricted Project
Ayal added a comment to D67764: [LV] Forced vectorization with runtime checks and OptForSize.

Ah, I had completely missed that this is about the interaction of OptForSize and the loop.vectorize loop hint!
Vectorization is forced, which means that we vectorize even though OptForSize is set and we need to emit runtime checks.

The symptom is that we run in this assert in emitMemRuntimeChecks():

assert(!BB->getParent()->hasOptSize() &&
     "Cannot emit memory checks when optimizing for size");

In getScalarEpilogueLowering(), we do not set CM_ScalarEpilogueNotAllowedOptSize because of the loop hint:

if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
    (F->hasOptSize() ||
     llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI))) {
  SEL = CM_ScalarEpilogueNotAllowedOptSize;

And because CM_ScalarEpilogueNotAllowedOptSize is not set, we do not check runtimeChecksRequired, so that we end up with code growth under OptForSize because of the loop hint.

If we agree that this is desired behaviour, which looks reasonable to me because of this condition, I am going to remove the assert as that is not valid.

Sep 21 2019, 5:42 PM · Restricted Project
Ayal added a comment to D67764: [LV] Forced vectorization with runtime checks and OptForSize.

Which commit does this trace back to, D65197? Why is the current logic placed too late, and should it be removed if moved earlier as suggested? Add a reference to the problem @uabelho reported.

Sep 21 2019, 8:09 AM · Restricted Project

Sep 20 2019

Ayal added a comment to D67764: [LV] Forced vectorization with runtime checks and OptForSize.

When we optimise for size, and need to emit runtime checks to disambiguate memory, we should nicely bail, don't vectorise, and not run in an assert like we currently do.

Sep 20 2019, 9:14 PM · Restricted Project

Sep 18 2019

Ayal added inline comments to D67690: [LV][NFC] Factor out calculation of "best" estimated trip count..
Sep 18 2019, 1:57 AM · Restricted Project

Sep 15 2019

Ayal added inline comments to D67510: [LV] Support gaps, overlaps, and inexact sizes in speculation logic.
Sep 15 2019, 2:04 AM · Restricted Project

Sep 14 2019

Ayal added inline comments to D67510: [LV] Support gaps, overlaps, and inexact sizes in speculation logic.
Sep 14 2019, 9:53 AM · Restricted Project

Sep 12 2019

Ayal added inline comments to D67510: [LV] Support gaps, overlaps, and inexact sizes in speculation logic.
Sep 12 2019, 2:16 PM · Restricted Project

Sep 11 2019

Ayal added a comment to D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads.

...
p.s. There was one comment in previous review about needing to handle the interleave case. I'm interpreting that as a request for further optimization (i.e. broadened scope) and have left it for future work. If it was actually a correctness concern, then please explain more.

Sep 11 2019, 1:00 PM · Restricted Project
Ayal accepted D67372: [LV] Support invariant addresses in speculation logic.

Thanks, just a minor nit.

Sep 11 2019, 12:48 PM · Restricted Project

Sep 8 2019

Ayal added a comment to D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads.

This LGTM, just some minor nits.

Sep 8 2019, 11:22 AM · Restricted Project

Sep 3 2019

Ayal accepted D66803: [LV] Tail-folding with runtime memory checks.
Sep 3 2019, 12:37 AM · Restricted Project
Ayal accepted D66932: [LV] Tail-folding, runtime scev checks.
Sep 3 2019, 12:36 AM · Restricted Project

Sep 2 2019

Ayal accepted D67074: [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit.

This LGTM too; added a couple of minor comments about comments.

Sep 2 2019, 1:58 PM · Restricted Project

Aug 29 2019

Ayal added inline comments to D66932: [LV] Tail-folding, runtime scev checks.
Aug 29 2019, 3:34 AM · Restricted Project
Ayal added a comment to D66803: [LV] Tail-folding with runtime memory checks.

I have addressed the other assert in D66932.

Aug 29 2019, 3:29 AM · Restricted Project

Aug 28 2019

Ayal committed rGd15df0ede589: [LV] Fold tail by masking - handle reductions (authored by Ayal).
[LV] Fold tail by masking - handle reductions
Aug 28 2019, 2:02 AM
Ayal added a comment to D66803: [LV] Tail-folding with runtime memory checks.

Shall we address this separately in a different patch? I am looking at this now, and feel that I have embarked on a little SCEV adventure, and that this is a separate issue.

Aug 28 2019, 1:50 AM · Restricted Project

Aug 27 2019

Ayal added a comment to D66803: [LV] Tail-folding with runtime memory checks.

The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout. There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).

Aug 27 2019, 8:33 AM · Restricted Project
Ayal added a comment to D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads.

Rebase over landed tests (including a bunch of negative ones).

Note that this reveals a bug in the patch. Will fix, and iterate from there.

Expect several rounds of incremental improvements before ready for real review.

Aug 27 2019, 2:31 AM · Restricted Project

Aug 26 2019

Ayal added a comment to D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads.

There's some obvious room for API cleanup in the patch, but before I iterate on that, I want to make sure I'm approaching the problem the "right way" and that my attempt at clarifying comments are actually correct. :)

Aug 26 2019, 4:55 AM · Restricted Project

Aug 25 2019

Ayal created D66720: [LV] Fold tail by masking - handle reductions.
Aug 25 2019, 10:33 AM · Restricted Project

Aug 13 2019

Ayal accepted D66106: [LV] fold-tail predication should be respected even with assume_safety .

LGTM, good catch!

Aug 13 2019, 2:01 PM · Restricted Project

Jul 15 2019

Ayal added inline comments to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..
Jul 15 2019, 3:27 AM · Restricted Project