Page MenuHomePhabricator

Ayal (Ayal Zaks)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 12 2015, 1:48 PM (247 w, 2 d)

Recent Activity

Today

Ayal added inline comments to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.
Tue, Apr 7, 3:45 AM
Ayal created D77635: [LV] Vectorize with FoldTail when Primary Induction is absent.
Tue, Apr 7, 2:40 AM · Restricted Project
Ayal added a comment to D77635: [LV] Vectorize with FoldTail when Primary Induction is absent.

@SjoerdMeijer , test tail-folding-counting-down.ll introduced in D72324 now fails, as it can be vectorized with fold-tail, but is not vectorized due to cost. What's the intention of this test and how should it be changed?

Tue, Apr 7, 2:40 AM · Restricted Project
Ayal added a comment to D77539: [LV] Add VPValue operands to VPBlendRecipe (NFCI).

minor comments, looks good to me, thanks.

Tue, Apr 7, 2:10 AM · Restricted Project

Yesterday

Ayal added inline comments to D75746: [LoopVectorizer] Simplify branch in the remainder loop for trivial cases.
Mon, Apr 6, 1:04 PM · Restricted Project

Sun, Apr 5

Ayal added inline comments to D77467: [VPlan] Introduce new VPWidenCallRecipe (NFC)..
Sun, Apr 5, 5:19 AM · Restricted Project
Ayal added inline comments to D77467: [VPlan] Introduce new VPWidenCallRecipe (NFC)..
Sun, Apr 5, 4:15 AM · Restricted Project
Ayal added inline comments to D77467: [VPlan] Introduce new VPWidenCallRecipe (NFC)..
Sun, Apr 5, 2:39 AM · Restricted Project

Fri, Apr 3

Ayal added inline comments to D75746: [LoopVectorizer] Simplify branch in the remainder loop for trivial cases.
Fri, Apr 3, 7:29 AM · Restricted Project
Ayal added inline comments to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.
Fri, Apr 3, 5:20 AM

Thu, Apr 2

Ayal added inline comments to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.
Thu, Apr 2, 5:22 PM

Wed, Apr 1

Ayal added inline comments to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.
Wed, Apr 1, 9:20 AM
Ayal added inline comments to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.
Wed, Apr 1, 4:25 AM
Ayal added a comment to D76992: [VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)..

Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.

The main reason for moving it into the recipe is so we can use getUnderlyingValue when lowering calls. A cleaner solution to deal with call would be to move them to their own recipe/instruction and record the required info as suggested. Is that along the line you suggest? I.e. something like the following: Move calls to their own recipe (NFC) and then move VPWidenRecipe and VPCallRecipe to VPUser in parallel?

Without changes to the call handling, I am not sure how we would be able to use the existing logic (unless we expose getUnderlyingValue to ILV.

Wed, Apr 1, 1:36 AM · Restricted Project

Mon, Mar 30

Ayal added a comment to D76992: [VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)..

Thanks for following-up quickly. The part that extends VPWidenRecipe to have a User for holding its VPValue operands looks ok, but could the patch make sure that these operands are indeed used correctly by ILV::widenInstruction(), instead of moving the latter over to its execute()? The former may involve refactoring the case that deals with Calls, to record cost-based UseVectorInstrinsic/NeedToScalarize decisions when constructing the VPlan/recipe rather than during codegen; to record InvariantCond of Selects instead of accessing getSCEV(I.getOperand(0)) during codegen, possibly cleaning up to generate ScalarCond OR Cond but not both. In any case, accesses to I’s operands should all change to access User's operands instead, possibly with their underlying ingredient if needed.

Mon, Mar 30, 3:51 PM · Restricted Project

Thu, Mar 26

Ayal added a comment to D76838: [LV][LoopInfo] Transform counting-down loops to counting-up loop.

It would indeed be good to have FoldTail handle loops that currently lack a Primary Induction Variable (0, +1). Such an %iv is needed in order to build the "%iv < %TripCount" comparison for masking.
In order to handle loops with reversed (%TripCount, -1) induction variables %riv instead of a PIV, this comparison should use %iv = %TripCount - %riv, or simply compare "%riv > 0" instead.
Note that LV eventually always builds a new PIV to control the vector loop, so best use it instead of an %riv; but this PIV is not (yet) represented in VPlan directly, i.e., in the absence of an original PIV.
There are however non-primary cases other than reversed whose tail would also be good to fold, e.g., when the start index is non-zero or when SCEV can somehow determine the trip count as in PR40816.

Thu, Mar 26, 10:51 AM
Ayal accepted D76686: [LV] widenIntOrFpInduction. NFC..

This looks good to me, thanks. The title should be more descriptive, e.g., "[LV] Refactor widenIntOrFpInduction, NFC".
Please wait a day or so to see if @fhahn or @samparker have further comments.

Thu, Mar 26, 3:12 AM · Restricted Project

Wed, Mar 25

Ayal added inline comments to D76686: [LV] widenIntOrFpInduction. NFC..
Wed, Mar 25, 5:53 PM · Restricted Project
Ayal added inline comments to D76686: [LV] widenIntOrFpInduction. NFC..
Wed, Mar 25, 10:15 AM · Restricted Project

Tue, Mar 24

Ayal added a comment to D76373: [LV] Replace stored value with a VPValue (NFCI).

This looks good to me too, thanks, with minor nit.

Tue, Mar 24, 11:16 AM · Restricted Project

Fri, Mar 20

Ayal added inline comments to D75746: [LoopVectorizer] Simplify branch in the remainder loop for trivial cases.
Fri, Mar 20, 5:55 PM · Restricted Project

Wed, Mar 18

Ayal accepted D76375: [VPlan] Do not print mapping for Value2VPValue..

Minor nit.
Thanks for following up quickly on https://reviews.llvm.org/D76200#inline-694794

Wed, Mar 18, 1:35 PM · Restricted Project
Ayal accepted D76374: [VPlan] Record underlying value for VPValues created by addVPValue (NFC)..

Thanks for following-up quickly on https://reviews.llvm.org/D76200#inline-696071

Wed, Mar 18, 1:02 PM · Restricted Project
Ayal accepted D76200: [VPlan] Use underlying value for printing, if available..

This looks good to me, thanks!
Added some minor final comments.

Wed, Mar 18, 9:47 AM · Restricted Project

Tue, Mar 17

Ayal added inline comments to D76200: [VPlan] Use underlying value for printing, if available..
Tue, Mar 17, 9:05 AM · Restricted Project

Sun, Mar 15

Ayal added inline comments to D76200: [VPlan] Use underlying value for printing, if available..
Sun, Mar 15, 1:55 PM · Restricted Project

Sat, Mar 14

Ayal added a comment to D75746: [LoopVectorizer] Simplify branch in the remainder loop for trivial cases.

Eliminating a redundant back-edge is clearly good. It would be better if such a special-case cleanup could be handled by some subsequent optimization, and possibly generalized. Note however that the remainder loop may or may not be considered subject for further optimization: LV currently disables unrolling the remainder loop following r231631, OTOH it may be worth vectorizing according to D30247. If desired, optimizations other than vectorization should preferably be taken care of by subsequent passes such as indvars and loop-unroll. LV knows that the trip count of the remainder loop is at most VF*UF, in the absence of overflows and runtime guards, and can make an effort to convey this bound, if GVN and IPSCCP fail to pick it up (referring to PR44547). Unfortunately, introducing an llvm.assume() seems insufficient - perhaps it could be made to work?

Sat, Mar 14, 7:29 AM · Restricted Project

Wed, Mar 11

Ayal added a comment to D74695: [VPlan] Replace VPWidenRecipe with VPInstruction (WIP)..

Breaking the recipes into VPInstructions is indeed on the roadmap for evolving VPlan to support transformations, cost modeling, and more.

Wed, Mar 11, 3:48 AM · Restricted Project

Jan 30 2020

Ayal added a comment to D73490: [LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions from DenseMap to MapVector .

Culprit is the following from LoopVectorize.cpp:

Jan 30 2020, 12:54 AM · Restricted Project

Jan 22 2020

Ayal accepted D73079: [LV] Fix predication for branches with matching true and false succs..

Ship it!

Jan 22 2020, 1:09 PM · Restricted Project
Ayal added inline comments to D73079: [LV] Fix predication for branches with matching true and false succs..
Jan 22 2020, 10:05 AM · Restricted Project

Jan 16 2020

Ayal added inline comments to D67905: [LV] Vectorizer should adjust trip count in profile information.
Jan 16 2020, 11:58 PM · Restricted Project
Ayal accepted D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..

This look good to me, with a last minor nit, thanks!

Jan 16 2020, 11:49 PM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Jan 16 2020, 8:38 AM · Restricted Project

Jan 15 2020

Ayal accepted D67905: [LV] Vectorizer should adjust trip count in profile information.

This looks good to me, thanks!

Jan 15 2020, 4:03 PM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Jan 15 2020, 3:05 PM · Restricted Project

Jan 12 2020

Ayal accepted D68814: [LV] Allow assume calls in predicated blocks..

Agree that it's better to merge the new conditional assume tests with the existing unconditional assume tests in the same file.

Jan 12 2020, 3:41 AM · Restricted Project

Jan 9 2020

Ayal added a comment to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:

Jan 9 2020, 8:40 AM · Restricted Project
Ayal accepted D72387: [LoopVectorize][TTI] Add an isLegalMaskedLoadStore method. NFC.

Thanks!

Jan 9 2020, 4:45 AM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Jan 9 2020, 12:53 AM · Restricted Project
Ayal accepted D72029: [LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount..

This looks good to me, thanks!

Jan 9 2020, 12:43 AM · Restricted Project

Jan 8 2020

Ayal accepted D70865: [LV] VPValues for memory operation pointers (NFCI).

This looks good to me, ship it. Added some final minor optional nits.

Jan 8 2020, 11:57 PM · Restricted Project

Jan 7 2020

Ayal added inline comments to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..
Jan 7 2020, 2:35 PM · Restricted Project

Jan 5 2020

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

Thanks. Perhaps it's better for Legal to declare via getConditionalAssumes() what is being returning, and let LVP decide what to do with them(*), rather than to assume from the start via getAssumesToDrop() that they must all be dropped.

Jan 5 2020, 12:30 PM · Restricted Project

Jan 2 2020

Ayal accepted D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

This looks good to me, adding a couple of minor last nits, thanks!

Jan 2 2020, 4:24 AM · Restricted Project

Dec 31 2019

Ayal added inline comments to D72029: [LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount..
Dec 31 2019, 5:38 AM · Restricted Project
Ayal added a comment to D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available..

The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example: [snip]

However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.

This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.

Dec 31 2019, 3:46 AM · Restricted Project

Dec 30 2019

Ayal added a comment to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..

Just pointing out for the record that this is extracted from D67905

Dec 30 2019, 4:31 AM · Restricted Project
Ayal added inline comments to D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..
Dec 30 2019, 4:27 AM · Restricted Project

Dec 28 2019

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

[snip]

I vote to simply drop conditional assumes as a first step, at this time. [snip] Following D68577 such assumes could be dropped as a VPlan-to-VPlan transformation.

Dec 28 2019, 9:06 AM · Restricted Project

Dec 26 2019

Ayal accepted D71055: [LV][NFC] Some refactoring and renaming to facilitate next change..

This looks good to me, with a couple of final optional comments.

Dec 26 2019, 10:42 AM · Restricted Project
Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Thanks for making all the changes! More comments inline.

Dec 26 2019, 8:50 AM · Restricted Project
Ayal added a comment to D70865: [LV] VPValues for memory operation pointers (NFCI).

Looks good to me; important step forward teaching ILV to deal with VPValues, lifting dependencies on ingredient operands and other attributes, simplifying recipes.

Dec 26 2019, 4:00 AM · Restricted Project

Dec 23 2019

Ayal added a comment to D71055: [LV][NFC] Some refactoring and renaming to facilitate next change..

Currently 'createVectorizedLoopSkeleton' keeps track of new generated blocks using local variables and reassigns them to corresponding class fields at the end. That makes a bit harder to understand code structure. This patch uses class fields directly and extra locals got removed.

Dec 23 2019, 1:31 PM · Restricted Project

Dec 20 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

@Ayal : Thanks! Could you commit it for me, as I have no commit access yet?

Dec 20 2019, 5:19 AM · Restricted Project
Ayal committed rGe498be573871: [LV] Strip wrap flags from vectorized reductions (authored by Ayal).
[LV] Strip wrap flags from vectorized reductions
Dec 20 2019, 5:11 AM
Ayal closed D69563: [LV] Strip wrap flags from vectorized reductions.
Dec 20 2019, 5:10 AM · Restricted Project

Dec 19 2019

Ayal accepted D69563: [LV] Strip wrap flags from vectorized reductions.

This looks good to me, with couple of final nits; thanks!

Dec 19 2019, 4:40 AM · Restricted Project

Dec 18 2019

Ayal added inline comments to D69563: [LV] Strip wrap flags from vectorized reductions.
Dec 18 2019, 2:38 PM · Restricted Project
Ayal added inline comments to D67905: [LV] Vectorizer should adjust trip count in profile information.
Dec 18 2019, 2:00 PM · Restricted Project
Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

No need for collectReductionInstructions() any more.

Dec 18 2019, 12:24 PM · Restricted Project

Dec 17 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Probably not too much important because should be handled by the vector predicated instructions/intrinsics, but still

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

But this should apply to all such operations that participate in the vectorized part of the loop. Note that
(1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and

Is there some existing API to find them all? Or I need to invite my own?

AFAIK such an API does not currently exist.

Would not it be easier just to not copy wrap flags in widenInstruction() for all instructions [which I was shy to do initially :) ] or it is too aggressive?

Loosing all wrap flags would be too aggressive.

Why is it ok not to drop nuw here:

define i8 @function0(i8 %a) {
entry:
  br label %for.body

for.body:
  %indvars.iv = phi i32 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
  %cmp5 = icmp ult i8 %a, 127
  br i1 %cmp5, label %if.then, label %if.end

if.then:
  %mul = mul nuw i8 %a, 2
  br label %if.end

if.end:
  %k = phi i8 [ %mul, %if.then ], [ %a, %for.body ]
  %indvars.iv.next = add i32 %indvars.iv, 1
  %cmp = icmp slt i32 %indvars.iv.next, 42
  br i1 %cmp, label %for.body, label %for.end

for.end:
  ret i8 undef
}

Vector code generated is

vector.ph:                                        ; preds = %entry
  %broadcast.splatinsert1 = insertelement <4 x i8> undef, i8 %a, i32 0
  %broadcast.splat2 = shufflevector <4 x i8> %broadcast.splatinsert1, <4 x i8> undef, <4 x i32> zeroinitializer
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
  %0 = add i32 %index, 0
  %1 = icmp ult <4 x i8> %broadcast.splat2, <i8 -128, i8 -128, i8 -128, i8 -128>
  %2 = mul nuw <4 x i8> %broadcast.splat2, <i8 2, i8 2, i8 2, i8 2>                                  ; if %a == 200, this is poison...
  %3 = xor <4 x i1> %1, <i1 true, i1 true, i1 true, i1 true>
  %predphi = select <4 x i1> %3, <4 x i8> %broadcast.splat2, <4 x i8> %2                     ; ... even though the %predphi == %a broadcasted, it's still poison as it depends on %2 (according to https://llvm.org/docs/LangRef.html#poisonvalues)
  %index.next = add i32 %index, 4
  %4 = icmp eq i32 %index.next, 40
  br i1 %4, label %middle.block, label %vector.body, !llvm.loop !0

Do I miss anything important here that allows us not to drop "nuw" flags?

Dec 17 2019, 11:58 PM · Restricted Project
Ayal added inline comments to D69563: [LV] Strip wrap flags from vectorized reductions.
Dec 17 2019, 12:28 AM · Restricted Project

Dec 8 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Add test(s) having multiple wrapping reduction instructions.

Dec 8 2019, 11:48 AM · Restricted Project

Dec 7 2019

Ayal added a comment to D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..

Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.

I tried, but the test case requires tail folding and it seems to depend on an actual target. We make the decision to tail-fold in computeMaxVF (https://github.com/llvm/llvm-project/blob/c49194969430f0ee817498a7000a979a7a0ded03/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L4941) and use the max target VF, not the forced vector width. This seems a bit odd and we might want to change that, to make it easier to test tail-folding without any specific target. What do you think?

Dec 7 2019, 1:13 PM · Restricted Project

Dec 6 2019

Ayal added inline comments to D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..
Dec 6 2019, 1:51 AM · Restricted Project
Ayal accepted D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..

Looks good to me with minor optional notes.
Consider adding a target-indepedent test, e.g., augmenting first-order-recurrence.ll; tests can possibly be committed before the fix, demonstrating current behavior and then the change brought by this fix.

Dec 6 2019, 1:51 AM · Restricted Project

Dec 5 2019

Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Dec 5 2019, 5:00 AM · Restricted Project
Ayal accepted D69067: [LV] Record GEP widening decisions in recipe (NFCI).
Dec 5 2019, 1:43 AM · Restricted Project

Dec 3 2019

Ayal committed rG6ed9cef25f91: [LV] Scalar with predication must not be uniform (authored by Ayal).
[LV] Scalar with predication must not be uniform
Dec 3 2019, 9:56 AM
Ayal closed D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.
Dec 3 2019, 9:56 AM · Restricted Project

Dec 2 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

It looks like this also fixes https://bugs.llvm.org/show_bug.cgi?id=43951. The reproducer there might be a good source for a test case as well.

Dec 2 2019, 3:02 PM · Restricted Project

Dec 1 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

In the test case we predicate due to low trip count right? It might be worth calling that out.

Dec 1 2019, 3:08 PM · Restricted Project
Ayal updated the diff for D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Simplified the test as suggested.

Dec 1 2019, 2:56 PM · Restricted Project
Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

But this should apply to all such operations that participate in the vectorized part of the loop. Note that
(1) there may be several such add/sub instructions, as in llvm/test/Transforms/LoopVectorize/reduction.ll tests, and

Is there some existing API to find them all? Or I need to invite my own?

Dec 1 2019, 8:23 AM · Restricted Project

Nov 28 2019

Ayal added a comment to D69563: [LV] Strip wrap flags from vectorized reductions.

Good catch, binary operations that perform reduction must indeed be vectorized w/o wrap flags.

Nov 28 2019, 8:05 AM · Restricted Project

Nov 27 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

and rebased.

Nov 27 2019, 12:03 PM · Restricted Project
Ayal updated the diff for D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Cleaned-up the #ifndef/LLVM_DEBUG and renamed the lambda.

Nov 27 2019, 12:03 PM · Restricted Project

Nov 26 2019

Ayal added a comment to D69067: [LV] Record GEP widening decisions in recipe (NFCI).

This looks good to me, thanks, plus a couple of comments.

Nov 26 2019, 8:16 AM · Restricted Project

Nov 20 2019

Ayal added a comment to D67805: [LV] Allow vectorization of hot short trip count loops with epilog.

Ideally we should be using loop size and expected perf gain from cost model in addition to loop hotness to make the decision but that would be much bigger change and can be a follow up step.

Nov 20 2019, 2:27 PM · Restricted Project
Ayal added a comment to D67905: [LV] Vectorizer should adjust trip count in profile information.

Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).

Nov 20 2019, 3:50 AM · Restricted Project

Nov 18 2019

Ayal added a comment to D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.

Nice! This indeed seems to solve the problem we saw in PR40816.

I don't know the code but I can at least provide some additional testing by including this patch in my weekend testing to see nothing unexpected pops up there.

Testing went well!

Nov 18 2019, 1:43 AM · Restricted Project

Nov 15 2019

Ayal created D70298: [LV] Avoid considering scalar-with-predication instructions as also uniform-after-vectorization, fix PR40816.
Nov 15 2019, 2:24 AM · Restricted Project

Nov 12 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
  1. Such loops can and should be optimized to have a simple induction variable with constant trip count regardless of (and prior to) LV.

indvars does this transform. That said, it only runs once; in some cases, we manage to simplify the loop after indvars runs. Maybe it makes sense to run indvars again? We'd want to measure how much it actually triggers in practice.

Nov 12 2019, 12:19 PM · Restricted Project

Nov 9 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).

Testcase follows; reproduce with opt -loop-vectorize:

[snip]

Nov 9 2019, 2:22 PM · Restricted Project

Nov 7 2019

Ayal updated subscribers of D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..

As revert eaff300 shows, more care must be taken to avoid having one FOR (First Order Recurring) phi sink an instruction that another FOR phi needs to keep in place. Namely, any Previous instruction that feeds a FOR phi across the backarc must not be allowed to sink, because that may break its dominance on other instructions who use the FOR phi. The original check against SinkAfter.count(Previous)) // Cannot rely on dominance due to motion addresses this concern but only partially; it works if the FOR phi that needs to sink an instruction is handled before the FOR phi that needs to keep it in place. But the two phi's could be encountered in the other order.

Nov 7 2019, 3:04 PM · Restricted Project

Nov 2 2019

Ayal added a comment to D68831: [LV] Mark instructions with loop invariant arguments as uniform. (WIP).

Thanks for coming back to look into this @fhahn !

Nov 2 2019, 4:35 PM · Restricted Project
Ayal added inline comments to D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..
Nov 2 2019, 3:04 PM · Restricted Project

Nov 1 2019

Ayal added a comment to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).

Any other concern, guys?

Nov 1 2019, 2:04 PM · Restricted Project

Oct 31 2019

Ayal added a comment to D69067: [LV] Record GEP widening decisions in recipe (NFCI).

Having special treatment for GEPs here does follow the unique way in which GEPs tolerate both scalar/invariant operands and/or vector operands.

Oct 31 2019, 8:00 AM · Restricted Project

Oct 29 2019

Ayal accepted D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..

This looks good to me, plus some final minor comments, thanks!

Oct 29 2019, 12:19 PM · Restricted Project

Oct 22 2019

Ayal added inline comments to D69228: [LV] Generalize conditions for sinking instrs for first order recurrences..
Oct 22 2019, 3:17 AM · Restricted Project

Oct 21 2019

Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.

To retain conditional assumes along with their control flow, they could be marked under isScalarWithPredication; but this complicates vectorization, plus what use are such assumes when all else is if-converted(?)

Conditional assumes under uniform control flow could be retained, along with the uniform control flow they depend upon; this may be mostly relevant for outerloop vectorization.

I thought it might be desirable to only drop the calls at the point where we know for certain that we cannot preserve them: when we emit them in a block that gets merge in its predecessor. By doing it in the recipe, it should also be applicable to the outer loop vectorization. Alternatively we can drop them all as a first step, if that is preferred. Please let me know :)

Oct 21 2019, 1:54 PM · Restricted Project

Oct 13 2019

Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 13 2019, 2:17 PM · Restricted Project
Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 13 2019, 4:26 AM · Restricted Project
Ayal added a comment to D68814: [LV] Allow assume calls in predicated blocks..

If conditional assumes are to be dropped, better do so on entry to VPlan, as in DeadInstructions, rather than representing them in ReplicateRecipe (as do unconditional assumes) and silencing their code generation.

Oct 13 2019, 3:32 AM · Restricted Project

Oct 9 2019

Ayal added inline comments to D68577: [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC).
Oct 9 2019, 9:21 AM · Restricted Project
Ayal accepted D68082: [LV] Emitting SCEV checks with OptForSize.

This LGTM, with additional CHECK-NOT to the test, thanks!

Oct 9 2019, 5:49 AM · Restricted Project

Oct 8 2019

Ayal added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

Comments addressed:

  • cleaned up the test case a bit.
Oct 8 2019, 2:33 PM · Restricted Project

Oct 5 2019

Ayal added a comment to D68082: [LV] Emitting SCEV checks with OptForSize.

The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch, being an optimization opportunity rather than preventing an assert -- it's stride predicate is added early enough to be caught by CM.computeMaxVF()'s bailout conditions and avoid assert, as scev4stride1() test in above X86/optsize.ll checks.

Oct 5 2019, 11:10 AM · Restricted Project