Page MenuHomePhabricator
Feed Advanced Search

Tue, Aug 13

Ayal accepted D66106: [LV] fold-tail predication should be respected even with assume_safety .

LGTM, good catch!

Tue, Aug 13, 2:01 PM · Restricted Project

Jul 15 2019

Ayal added inline comments to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..
Jul 15 2019, 3:27 AM · Restricted Project

Jul 14 2019

Ayal accepted D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

LGTM, with some additional thoughts provoked by this fix.

Jul 14 2019, 8:04 AM · Restricted Project

Jul 12 2019

Ayal added a comment to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

Some format typos, and clarifying if needsExtract() should assume vectorized or scalarized before scalars are computed.

Jul 12 2019, 4:27 AM · Restricted Project

Jul 11 2019

Ayal added a comment to D63981: [LV] Avoid building interleaved group in presence of WAW dependency.

[snip]
But I'm a bit worry if reporting such WAW dependencies by LoopAccessInfo will affect many other places...and possibly compile time. What do you think?

Jul 11 2019, 3:34 PM · Restricted Project

Jul 10 2019

Ayal added a comment to D63981: [LV] Avoid building interleaved group in presence of WAW dependency.

From the SUMMARY above:

I don't think we actually need to do any additional book keeping (as suggested in D57180). Current algorithm already examines all pairs and in case of dependence will invalidate interleaving group. I think the real issue is WAW dependence (on the same iteration) is not collected by LoopAccessInfo in the first place and as a result canReorderMemAccessesForInterleavedGroups returns wrong answer. The fix is to explicitly check for WAW dependence in canReorderMemAccessesForInterleavedGroups.

Jul 10 2019, 5:14 PM · Restricted Project

Jun 28 2019

Ayal added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

Here's is an old draft along the lines discussed above, probably deserves some updating or clean ups:

Jun 28 2019, 6:10 AM · Restricted Project

May 14 2019

Ayal added a comment to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Done.

May 14 2019, 3:07 PM · Restricted Project

Apr 5 2019

Ayal added a comment to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Apr 5 2019, 1:06 AM · Restricted Project

Feb 7 2019

Ayal added inline comments to D57837: [LV] Prevent interleaving if computeMaxVF returned None..
Feb 7 2019, 1:01 PM · Restricted Project
Ayal added a comment to D57837: [LV] Prevent interleaving if computeMaxVF returned None..

Thanks Ayal! I'll add the line before committing. Are you ok for the latest iteration to go in?

Feb 7 2019, 12:24 PM · Restricted Project
Ayal added inline comments to D57837: [LV] Prevent interleaving if computeMaxVF returned None..
Feb 7 2019, 11:20 AM · Restricted Project
Ayal added inline comments to D57837: [LV] Prevent interleaving if computeMaxVF returned None..
Feb 7 2019, 9:02 AM · Restricted Project
Ayal added inline comments to D57837: [LV] Prevent interleaving if computeMaxVF returned None..
Feb 7 2019, 6:37 AM · Restricted Project

Feb 6 2019

Ayal accepted D57837: [LV] Prevent interleaving if computeMaxVF returned None..

This LGTM with the minor documentation comments and retention of UserIC.
Seems like we've reached a consensus, correct me if not.
Thanks!

Feb 6 2019, 3:42 PM · Restricted Project
Ayal added a comment to D57837: [LV] Prevent interleaving if computeMaxVF returned None..

I have a little mental barrier in accepting this change as is. I think this feeling of mine is mainly due to the name and the "stated" functionality of computeMaxVF and "indirect inference" towards using it for suppressing interleaving. Maybe I'm just being too picky here. If so, my apologies ahead of time.

Other than just the function name and the associated comment:

  1. runtime ptr check && hasBranchDivergence interleave w/ VF=1 still okay?
  2. For interleaving, the remaining checks other than TC==0/1 (for better diagnostics) and OptForSize appear useless. Even if TC is evenly divisible by VF or canFoldTailByMasking, we still can't interleave under OptForSize.

    I think we should start thinking in terms of separate "is vectorization feasible" and "is interleaving feasible" (plus possibly "is unrolling feasible"), even if we evaluate those feasibility in one function.

    Possibly going to tangent, but while we look at computeMaxVF, at the bottom, we say use pragma, but we bail out of plan() w/o checking whether UserVF exists or not.

    Hopefully, this conveys enough about my feeling of uneasiness.
Feb 6 2019, 3:27 PM · Restricted Project
Ayal added inline comments to D57837: [LV] Prevent interleaving if computeMaxVF returned None..
Feb 6 2019, 2:05 PM · Restricted Project
Ayal added a comment to D57382: [LV] Move interleave count computation to LVP::plan()..

The bug revealed by the test is rather subtle; does it match the original https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 ?

LV decided (rightfully) not the vectorize the loop in the test having trip-count == 1, but does decide to interleave it (wrongfully, see below). LVP::plan() returns {VF=1,Cost=0} representing the decision not to vectorize, but w/o building any VPlan, which breaks the interleaving decision taken afterwards.

LVP::plan() builds a VPlan for VF=1 only to serve interleaving. So one simple possible fix for this bug is to always build such a VPlan. This patch tries to be smarter, and have LVP::plan() build this single VPlan only if interleaving will actually need it. Another way to building this VPlan only if needed, is to build it on demand, if it's missing, once IC is determined (and before/at setBestPlan() which raised the assert). But note that LVP::plan() refrains from building a VPlan for VF=1 only if MaybeMaxVF is None, meaning "vectorization should be avoided up front". In such cases, interleaving should also be avoided up front! This "up frontness" can be propagated from CM.computeMaxVF() to CM.selectInterleaveCount() in one of several ways: raising a "CM.up-front" flag; have LVP::plan() return (1) an Optional<VectorizationFactor>; (2) a std::tie(VF, NoIC), similar to the current patch, but where NoIC is a boolean indicating IC should be set to 1; (3) a {VF=0,Cost=0} to denote that vectorization and interleaving are to be avoided up-front --- note that {VF=1,Cost=0} *also* represents UserVF=1, which *is* subject to interleaving.

It would be good to build VPlans more selectively to save compile-time, and possibly refactor how VF and IC decisions are taken, but better in a separate patch from one that fixes an erroneously missing VPlan?

Thanks Ayal! Do I understand correctly that you propose to first fix the missing VPlan issue by make interleaving respect the fact that interleaving should be avoided when computeMaxVF returns None? If that is the intended behavior, then I think it makes sense to fix that first and I'll prepare a separate patch.

Feb 6 2019, 8:40 AM · Restricted Project
Ayal added a comment to D57382: [LV] Move interleave count computation to LVP::plan()..

The bug revealed by the test is rather subtle; does it match the original https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 ?

Feb 6 2019, 7:51 AM · Restricted Project

Feb 1 2019

Ayal added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

Also, regarding

  1. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

consider eliminating such WAW dependencies by sinking the stores and folding them into one, producing a single interleave group (by a future, separate patch).

Feb 1 2019, 1:14 PM · Restricted Project
Ayal added a comment to D57180: [LV] Avoid adding into interleaved group in presence of WAW dependency.

I plan on having a look later this week. I am a little worried that the checks in-line here are already quite complex and I would like to have a think if that could be improved in some way.

I agree; The algorithm makes sure that we visit everything between B and A, including C, before we visit A; so we have a chance to identify the (potentially) interfering store C before we reach A; This is what allows the algorithm to only compare the pairs (A,B) without having each time to also scan everything in between.

So I think the bug is that when we visited C, and found that it could be inserted into B's group dependence-wise, but wasn't inserted due to other reasons, we should have either:

  1. Invalidated the group (which is over aggressive but better than wrong code)
  2. Recorded in B's Group the index where C could be inserted, to "burn" that index from allowing some other instruction A to become a group member at that index; so when we reach A we see its spot is taken. (I think this will have the same effect as the proposed patch but without the extra scan.)
  3. Same as above but instead of bailing out on grouping A with B, make sure that C is also sunk down with that group (as I think Hideki mentioned in the PR) (maybe a future improvement).

If you don't like the current approach, I agree 2) achieves the same thing, with extra bookkeeping (or extra state in the existing bookkeeping). I think 1) is too conservative. Even if C is next to B, B's group can still be extended to the other direction. 3) should be done separately from the bug fix. Anyway, do we ever deal with so many loads/stores for this efficiency to avoid extra scanning to actually matter? I'm just curious.

Feb 1 2019, 8:04 AM · Restricted Project

Jan 29 2019

Ayal added a comment to D30247: Epilog loop vectorization.

This patch is little old and not updated with latest vectorization changes.

In the RFC there are few concerns raised around the new block layout:
a) If original loop alias check fails, then directly jump to scalar loop
b) On critical path there should not be extra checks, i.e. when minimum iteration check fails for the original vector loop then directly jump to scalar loop. In my opinion this is possible optimization opportunity loss, because when minimum iteration check for original loop fails, can still try executing epilog loop and that requires epilog loop’s minimum iteration check. We can keep his under option.

I guess both the cases can be handled, I’ll re initiate the discussion & update this patch soon.

Is there any update with this patch? I find it very useful for cycles and size for targets that has masked operations not just for load and store but also for other operations including intra operations

Jan 29 2019, 8:15 AM · Restricted Project

Jan 11 2019

Ayal added a comment to D56551: [LoopVectorizer] give more advice in remark about failure to vectorize call.

This LGTM too, just adding mtcw wondering if these extra checks for more accurate reporting are worth placing under allowExtraAnalysis(); and/or if TLI->isFunctionVectorizable() shouldn't be the one informing the cause of its failure when returning false.

Jan 11 2019, 10:20 AM

Dec 19 2018

Ayal accepted D55798: [LAA] Avoid generating RT checks for known deps preventing vectorization..

This LGTM, thanks.

Dec 19 2018, 11:39 PM
Ayal added inline comments to D55798: [LAA] Avoid generating RT checks for known deps preventing vectorization..
Dec 19 2018, 2:43 PM
Ayal added inline comments to D55798: [LAA] Avoid generating RT checks for known deps preventing vectorization..
Dec 19 2018, 4:31 AM

Dec 18 2018

Ayal accepted D54892: [LAA] Introduce enum for vectorization safety status (NFC)..

LGTM, thanks.

Dec 18 2018, 12:41 PM
Ayal added inline comments to D55798: [LAA] Avoid generating RT checks for known deps preventing vectorization..
Dec 18 2018, 12:31 PM
Ayal added inline comments to D54892: [LAA] Introduce enum for vectorization safety status (NFC)..
Dec 18 2018, 11:24 AM

Dec 17 2018

Ayal added inline comments to D54892: [LAA] Introduce enum for vectorization safety status (NFC)..
Dec 17 2018, 3:45 PM

Dec 12 2018

Ayal added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Are you proposing some kind of search over instruction sequences with some limited lookahead?

Yes, something like this.

Dec 12 2018, 1:39 PM

Nov 28 2018

Ayal added a comment to D53865: [LoopVectorizer] Improve computation of scalarization overhead..

Making better decisions what to vectorize and what to keep scalar is clearly useful enough to include in the loop vectorizer. However, this should best be done in a target independent way; e.g., how computePredInstDiscount() and sinkScalarOperands() work to expand the scope of scalarized instructions according to the cumulative cost discount of potentially scalarized instruction chains. Unless there's a good reason for it to be target specific(?)

Nov 28 2018, 12:45 PM

Nov 27 2018

Ayal added inline comments to D54892: [LAA] Introduce enum for vectorization safety status (NFC)..
Nov 27 2018, 1:11 PM

Nov 26 2018

Ayal added inline comments to D54892: [LAA] Introduce enum for vectorization safety status (NFC)..
Nov 26 2018, 11:28 PM

Nov 18 2018

Ayal accepted D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.

LGTM, with minor optional comments. Would be good to add a test where the load is preceded by a store in the loop, indicating that such dependence is also non-vectorizable.

Nov 18 2018, 11:08 PM

Nov 14 2018

Ayal added inline comments to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Nov 14 2018, 1:41 PM

Nov 3 2018

Ayal added a comment to D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.

...
One thing I noticed is that if I use the test case from PR39417 and add -vectorizer-min-trip-count=3, to avoid the detection of a "very small trip count", the loop will be vectorized with VF=16. That is also what happened when we triggered the assert (without this patch). Shouldn't the VF be clamped to the trip count?
It seems like the vectorizer detects that the trip count is tiny (trip count is 3), but it vectorize using VF=16 but then the vectorized loop is skipped since we emit br i1 true, label %scalar.ph, label %vector.scevcheck. So all the hard work with vectorizing the loop is just a waste of time, or could it be beneficial to have VF > tripcount in some cases?

If the actual problem is that VF should be clamped to the trip count, then maybe this patch just hides that problem in certain cases (when having OptForSize).

Nov 3 2018, 11:00 PM
Ayal added inline comments to D52685: [LoopVectorizer] Adjust heuristics for a truncated load.
Nov 3 2018, 9:39 AM

Nov 2 2018

Ayal updated subscribers of D52685: [LoopVectorizer] Adjust heuristics for a truncated load.

We were wondering why not simply consider the scalar type of all not-to-be-ignored instructions in the loop instead. We're traversing them all anyway here.

Nov 2 2018, 3:24 AM

Nov 1 2018

Ayal updated the diff for D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.

Addressed comments.
Added another test case derived from PR39497.
Updated to trunk before committing.

Nov 1 2018, 5:11 PM

Oct 30 2018

Ayal accepted D53668: [LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads .

Also added a test with stride 3.

Oct 30 2018, 1:58 AM

Oct 29 2018

Ayal added a comment to D53668: [LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads .

Looking at tests next.

Oct 29 2018, 3:12 AM

Oct 23 2018

Ayal created D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.
Oct 23 2018, 2:18 PM

Oct 21 2018

Ayal added inline comments to D53420: [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size.
Oct 21 2018, 10:22 PM
Ayal accepted D53420: [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size.

LGTM, only minor comments and optional suggestions.

Oct 21 2018, 8:19 AM

Oct 11 2018

Ayal accepted D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.

LGTM; just a minor comment trying to keep unrelated NFC changes away.

Oct 11 2018, 5:14 AM

Oct 9 2018

Ayal added inline comments to D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.
Oct 9 2018, 6:06 AM
Ayal added inline comments to D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.
Oct 9 2018, 4:05 AM

Oct 5 2018

Ayal accepted D52656: [LV] Teach vectorizer about variant value store into uniform address.

Thanks, added only minor optional suggestions.

Oct 5 2018, 6:30 PM

Oct 4 2018

Ayal added inline comments to D52656: [LV] Teach vectorizer about variant value store into uniform address.
Oct 4 2018, 3:44 PM

Oct 2 2018

Ayal accepted D52682: [IAI,LV] Avoid creating interleave-groups for predicated accesses.

Nice catch, added only minor optional comments.

Oct 2 2018, 6:54 AM

Sep 24 2018

Ayal accepted D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Thanks for taking care of everything, this LGTM now, added only a few minor optional comments.

Sep 24 2018, 2:55 PM

Sep 12 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Best allow only a single store to an invariant address for now; until we're sure the last one to store is always identified correctly.

Sep 12 2018, 5:27 AM

Sep 10 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 10 2018, 4:54 PM

Sep 7 2018

Ayal added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

(post commit review)

Sep 7 2018, 9:35 AM

Aug 28 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 28 2018, 3:07 PM
Ayal added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

Jumping from D50480:

This patch aims to model a rather special early-exit condition that restricts the execution of the entire loop body to certain iterations, rather than model general compare instructions. If preferred, an "EarlyExit" extended opcode can be introduced instead of the controversial ICmpULE. This should be easy to revisit in the future if needed.

This patch is fine as is, or rather much better with ICmpULE than EarlyExit.

This patch focuses on modeling an early-exit compare and then generating it, w/o making strategic design decisions supporting future vplan-to-vplan transformations, the interfaces they may need, potential templatization, or other long-term high-level VPlan concerns. These should be explained and discussed separately along with pros and cons of alternative solutions for supporting the desired interfaces and for holding their storage, including subclassing VPInstructions, using detached Instructions, or other possibilities.

Sure. I agree.

[Full disclosure] I have a big mental barrier in accepting your "early-exit" terminology here since I relate that term to "break out of the loop", but that's just the terminology difference. Nothing to do with the substance of this patch. [End of full disclosure]

Regarding "using detached Instructions". I fully go against that because that'll forever prohibit moving the VPlan/VPInstructions into Analysis. IR Verifier will trigger if there is a detached IR Instruction at the end of an Analysis pass. I already had a hallway chat with @lattner about a possibility of using IR Instructions and IR CFG in the detached mode (and that also requires many utilities to be usable in detached mode) and he was totally pessimistic about it. That was two years ago at 2016 Developer Conference, but nothing really has changed since then in that regard. That was the end of my hope for using detached IR Instructions, instead of introducing VPInstructions. Detached Instructions under the hood of VPInstructions is not very useful if we can't keep them between vectorization Analysis pass and vectorization Transformation pass.

Aug 28 2018, 7:46 AM
Ayal added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

The above holds also for conditional loads from non-uniform addresses, that can turn into gathers, but possibly also get incorrectly scalarized w/o branches.

Aug 28 2018, 6:42 AM

Aug 27 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 27 2018, 4:41 PM

Aug 26 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

This is what the langref states for scatter intrinsic (https://llvm.org/docs/LangRef.html#id1792):

. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
Aug 26 2018, 9:29 AM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Reverted to use the original ICmpULE extended opcode instead of detached ICmpInst. This can be revised quite easily once VPInstructions acquire any other form of modeling compares.

Since the VPCmpInst code is ready (D50823) and this is a clear use case where we need to model a new compare (including its predicate) that is not in the input IR, I'd appreciate if we could discuss a bit more about using the VPCmpInst approach. At least, I'd like to understand what are the concerns about the VPCmpInst approach and what other people think.

I do have concerns regarding modeling ICmpULE as an opcode only for compare instructions newly created during a VPlan-to-VPlan transformation. For example:

...

Aug 26 2018, 7:21 AM

Aug 23 2018

Ayal added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 23 2018, 12:08 PM

Aug 22 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

  • - by revisiting LoopVectorizationCostModel::collectLoopUniforms()? ;-)

Right now, I just run instcombine after loop vectorization to clean up those unnecessary stores (and test cases make sure there's only one store left). Looks like there are other places in LV which relies on InstCombine as the clean up pass, so it may not be that bad after all? Thoughts?

Aug 22 2018, 2:24 PM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 22 2018, 9:17 AM
Ayal updated the diff for D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Addressing review comments, rebased, added a couple of asserts.

Aug 22 2018, 8:39 AM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 22 2018, 5:38 AM

Aug 20 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

Aug 20 2018, 4:07 PM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 20 2018, 3:08 PM
Ayal updated the diff for D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Addressed review comments.

Aug 20 2018, 3:01 PM
Ayal accepted D50778: [LV] Vectorize loops where non-phi instructions used outside loop.
Aug 20 2018, 1:16 PM

Aug 19 2018

Ayal added inline comments to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.
Aug 19 2018, 1:02 AM

Aug 15 2018

Ayal added a comment to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.

Suggest to update InnerLoopVectorizer::fixLCSSAPHIs() as follows, now that arbitrary values are allowed to be live-out:

unsigned LastLane = Cost->isUniformAfterVectorization(IncomingValue, VF) ? 0 : VF - 1;
Value *lastIncomingValue =
    getOrCreateScalarValue(IncomingValue, {UF - 1, LastLane});
Aug 15 2018, 3:15 PM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

I have a general question about direction, not specific to this patch.

It seems like we're adding a specific form of predication to the vectorizer in this patch and I know we already have support for various predicated load and store idioms. What are our plans in terms of supporting more general predication? For instance, I don't believe we handle loops like the following at the moment:

for (int i = 0; i < N; i++) {
 if (unlikely(i > M)) 
    break;
 sum += a[i];
}

Can the infrastructure in this patch be generalized to handle such cases? And if so, are their any specific plans to do so?

Aug 15 2018, 12:54 PM

Aug 14 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

The decision how to vectorize invariant stores also deserves attention: LoopVectorizationCostModel::setCostBasedWideningDecision() considers loads from uniform addresses, but not invariant stores - these may end up being scalarized or becoming a scatter; the former is preferred in this case, as the identical scalarized replicas can later be removed. In any case associated cost estimates should be provided to support overall vectorization costs. Note that vectorizing conditional invariant stores deserves special attention. Unconditional invariant stores are candidates to be sunk out of the loop, preferably before trying to vectorize it. One approach to vectorize a conditional invariant store is to check if its mask is all false, and if not to perform a single invariant scalar store, for lack of a masked-scalar-store instruction. May be worth distinguishing between uniform and divergent conditions; this check is easier to carry out in the former case.

Aug 14 2018, 1:53 PM
Ayal accepted D50579: [LV] Teach about non header phis that have uses outside the loop.
Aug 14 2018, 7:52 AM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Do you see any potential issue that could make modeling this in the VPlan native path complicated once we have predication?

Aug 14 2018, 12:25 AM

Aug 13 2018

Ayal added a comment to D50579: [LV] Teach about non header phis that have uses outside the loop.

Overall looks good to me, though it could be cleaned up a bit more?

Aug 13 2018, 2:12 PM

Aug 12 2018

Ayal added inline comments to D50579: [LV] Teach about non header phis that have uses outside the loop.
Aug 12 2018, 3:03 PM

Aug 11 2018

Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 11 2018, 2:06 PM
Ayal added inline comments to D50474: [LV] Vectorize header phis that feed from if-convertable latch phis.
Aug 11 2018, 2:11 AM

Aug 9 2018

Ayal added inline comments to D50474: [LV] Vectorize header phis that feed from if-convertable latch phis.
Aug 9 2018, 3:27 PM

Aug 8 2018

Ayal created D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 8 2018, 3:11 PM

Jun 14 2018

Ayal added inline comments to D48048: [LV] Prevent LV to run cost model twice for VF=2.
Jun 14 2018, 3:16 PM

Jun 12 2018

Ayal added inline comments to D48048: [LV] Prevent LV to run cost model twice for VF=2.
Jun 12 2018, 1:02 PM

May 1 2018

Ayal added inline comments to D46126: [SLP] Vectorize transposable binary operand bundles.
May 1 2018, 12:31 PM
Ayal added inline comments to D46126: [SLP] Vectorize transposable binary operand bundles.
May 1 2018, 8:46 AM

Apr 30 2018

Ayal added a comment to D46126: [SLP] Vectorize transposable binary operand bundles.

This is reminiscent of LV's interleave group optimization, in the sense that a couple of correlated inefficient vector "gathers" are replaced by a couple of efficiently formed vectors followed by transposing shuffles. The correlated gathers may come from the two operands of a binary operation, as in this patch, or more generally from arbitrary leaves of the SLP tree.

Apr 30 2018, 8:48 AM

Apr 1 2018

Ayal accepted D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

Looks good to me, thanks for addressing the issues, have only a few last minor suggestions.

Apr 1 2018, 12:31 AM

Mar 23 2018

Ayal added a comment to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

Have test(s) for extractvalue's, for completeness.
Make sure tests cover best-order selection: cases where original order is just as frequent as other orders (tie-break), less frequent, more frequent.

Mar 23 2018, 4:58 PM

Mar 18 2018

Ayal added a comment to D44523: Change calculation of MaxVectorSize.

See MaximizeBandwidth, as in
llvm-dev's Enable vectorizer-maximize-bandwidth by default?
patch: Enable vectorizer-maximize-bandwidth by default.
which is still reverted afaik: r306936 - Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."

Mar 18 2018, 1:52 PM

Mar 9 2018

Ayal added inline comments to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..
Mar 9 2018, 2:44 PM

Mar 7 2018

Ayal added a comment to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

This patch addresses the following TODO, plus handles extracts:

Mar 7 2018, 11:39 PM

Feb 27 2018

Ayal resigned from D43812: [LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast..
Feb 27 2018, 12:01 PM

Feb 21 2018

Ayal resigned from D43536: [LV] Fix for PR36311, vectorizer's isUniform() abuse triggers assert in SCEV.
Feb 21 2018, 3:26 AM

Feb 10 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Ayal, Sanjoy,

The last update's review was pending for long. Off late, SLP has lots of changes so I will have to rebase but before rebasing please see if any more changes required in its current form.

Thanks in advance.

Feb 10 2018, 11:48 PM

Feb 1 2018

Ayal added inline comments to D42123: Derive GEP index type from Data Layout.
Feb 1 2018, 2:49 AM

Jan 15 2018

Ayal added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Jan 15 2018, 11:49 PM

Jan 11 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#971181, @Ayal wrote:

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Test case, test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll, already included.

Jan 11 2018, 1:15 PM

Jan 9 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Jan 9 2018, 8:40 AM

Dec 29 2017

Ayal added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 29 2017, 7:31 AM