Ayal (Ayal Zaks)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 12 2015, 1:48 PM (175 w, 18 h)

Recent Activity

Yesterday

Ayal accepted D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.

LGTM, with minor optional comments. Would be good to add a test where the load is preceded by a store in the loop, indicating that such dependence is also non-vectorizable.

Sun, Nov 18, 11:08 PM

Wed, Nov 14

Ayal added inline comments to D54538: [LV] Avoid vectorizing unsafe dependencies in uniform address.
Wed, Nov 14, 1:41 PM

Sat, Nov 3

Ayal added a comment to D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.

...
One thing I noticed is that if I use the test case from PR39417 and add -vectorizer-min-trip-count=3, to avoid the detection of a "very small trip count", the loop will be vectorized with VF=16. That is also what happened when we triggered the assert (without this patch). Shouldn't the VF be clamped to the trip count?
It seems like the vectorizer detects that the trip count is tiny (trip count is 3), but it vectorize using VF=16 but then the vectorized loop is skipped since we emit br i1 true, label %scalar.ph, label %vector.scevcheck. So all the hard work with vectorizing the loop is just a waste of time, or could it be beneficial to have VF > tripcount in some cases?

If the actual problem is that VF should be clamped to the trip count, then maybe this patch just hides that problem in certain cases (when having OptForSize).

Sat, Nov 3, 11:00 PM
Ayal added inline comments to D52685: [LoopVectorizer] Adjust heuristics for a truncated load.
Sat, Nov 3, 9:39 AM

Fri, Nov 2

Ayal updated subscribers of D52685: [LoopVectorizer] Adjust heuristics for a truncated load.

We were wondering why not simply consider the scalar type of all not-to-be-ignored instructions in the loop instead. We're traversing them all anyway here.

Fri, Nov 2, 3:24 AM

Thu, Nov 1

Ayal updated the diff for D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.

Addressed comments.
Added another test case derived from PR39497.
Updated to trunk before committing.

Thu, Nov 1, 5:11 PM

Tue, Oct 30

Ayal accepted D53668: [LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads .

Also added a test with stride 3.

Tue, Oct 30, 1:58 AM

Mon, Oct 29

Ayal added a comment to D53668: [LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads .

Looking at tests next.

Mon, Oct 29, 3:12 AM

Tue, Oct 23

Ayal created D53612: [LV] Avoid vectorizing loops under opt for size that involve SCEV checks.
Tue, Oct 23, 2:18 PM

Sun, Oct 21

Ayal added inline comments to D53420: [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size.
Sun, Oct 21, 10:22 PM
Ayal accepted D53420: [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size.

LGTM, only minor comments and optional suggestions.

Sun, Oct 21, 8:19 AM

Oct 11 2018

Ayal accepted D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.

LGTM; just a minor comment trying to keep unrelated NFC changes away.

Oct 11 2018, 5:14 AM

Oct 9 2018

Ayal added inline comments to D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.
Oct 9 2018, 6:06 AM
Ayal added inline comments to D53011: [LV] Add support for vectorizing predicated strided accesses using masked interleave-group.
Oct 9 2018, 4:05 AM

Oct 5 2018

Ayal accepted D52656: [LV] Teach vectorizer about variant value store into uniform address.

Thanks, added only minor optional suggestions.

Oct 5 2018, 6:30 PM

Oct 4 2018

Ayal added inline comments to D52656: [LV] Teach vectorizer about variant value store into uniform address.
Oct 4 2018, 3:44 PM

Oct 2 2018

Ayal accepted D52682: [IAI,LV] Avoid creating interleave-groups for predicated accesses.

Nice catch, added only minor optional comments.

Oct 2 2018, 6:54 AM

Sep 24 2018

Ayal accepted D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Thanks for taking care of everything, this LGTM now, added only a few minor optional comments.

Sep 24 2018, 2:55 PM

Sep 12 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

Best allow only a single store to an invariant address for now; until we're sure the last one to store is always identified correctly.

Sep 12 2018, 5:27 AM

Sep 10 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Sep 10 2018, 4:54 PM

Sep 7 2018

Ayal added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

(post commit review)

Sep 7 2018, 9:35 AM

Aug 28 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 28 2018, 3:07 PM
Ayal added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

Jumping from D50480:

This patch aims to model a rather special early-exit condition that restricts the execution of the entire loop body to certain iterations, rather than model general compare instructions. If preferred, an "EarlyExit" extended opcode can be introduced instead of the controversial ICmpULE. This should be easy to revisit in the future if needed.

This patch is fine as is, or rather much better with ICmpULE than EarlyExit.

This patch focuses on modeling an early-exit compare and then generating it, w/o making strategic design decisions supporting future vplan-to-vplan transformations, the interfaces they may need, potential templatization, or other long-term high-level VPlan concerns. These should be explained and discussed separately along with pros and cons of alternative solutions for supporting the desired interfaces and for holding their storage, including subclassing VPInstructions, using detached Instructions, or other possibilities.

Sure. I agree.

[Full disclosure] I have a big mental barrier in accepting your "early-exit" terminology here since I relate that term to "break out of the loop", but that's just the terminology difference. Nothing to do with the substance of this patch. [End of full disclosure]

Regarding "using detached Instructions". I fully go against that because that'll forever prohibit moving the VPlan/VPInstructions into Analysis. IR Verifier will trigger if there is a detached IR Instruction at the end of an Analysis pass. I already had a hallway chat with @lattner about a possibility of using IR Instructions and IR CFG in the detached mode (and that also requires many utilities to be usable in detached mode) and he was totally pessimistic about it. That was two years ago at 2016 Developer Conference, but nothing really has changed since then in that regard. That was the end of my hope for using detached IR Instructions, instead of introducing VPInstructions. Detached Instructions under the hood of VPInstructions is not very useful if we can't keep them between vectorization Analysis pass and vectorization Transformation pass.

Aug 28 2018, 7:46 AM
Ayal added a comment to D51313: [LV] Fix code gen for conditionally executed uniform loads.

The above holds also for conditional loads from non-uniform addresses, that can turn into gathers, but possibly also get incorrectly scalarized w/o branches.

Aug 28 2018, 6:42 AM

Aug 27 2018

Ayal added inline comments to D51313: [LV] Fix code gen for conditionally executed uniform loads.
Aug 27 2018, 4:41 PM

Aug 26 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

This is what the langref states for scatter intrinsic (https://llvm.org/docs/LangRef.html#id1792):

. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
Aug 26 2018, 9:29 AM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Reverted to use the original ICmpULE extended opcode instead of detached ICmpInst. This can be revised quite easily once VPInstructions acquire any other form of modeling compares.

Since the VPCmpInst code is ready (D50823) and this is a clear use case where we need to model a new compare (including its predicate) that is not in the input IR, I'd appreciate if we could discuss a bit more about using the VPCmpInst approach. At least, I'd like to understand what are the concerns about the VPCmpInst approach and what other people think.

I do have concerns regarding modeling ICmpULE as an opcode only for compare instructions newly created during a VPlan-to-VPlan transformation. For example:

...

Aug 26 2018, 7:21 AM

Aug 23 2018

Ayal added inline comments to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.
Aug 23 2018, 12:08 PM

Aug 22 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

  • - by revisiting LoopVectorizationCostModel::collectLoopUniforms()? ;-)

Right now, I just run instcombine after loop vectorization to clean up those unnecessary stores (and test cases make sure there's only one store left). Looks like there are other places in LV which relies on InstCombine as the clean up pass, so it may not be that bad after all? Thoughts?

Aug 22 2018, 2:24 PM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 22 2018, 9:17 AM
Ayal updated the diff for D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Addressing review comments, rebased, added a couple of asserts.

Aug 22 2018, 8:39 AM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 22 2018, 5:38 AM

Aug 20 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

...

Yes, the stores are scalarized. Identical replicas left as-is. Either passes such as load elimination can remove it, or we can clean it up in LV itself.

Aug 20 2018, 4:07 PM
Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 20 2018, 3:08 PM
Ayal updated the diff for D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Addressed review comments.

Aug 20 2018, 3:01 PM
Ayal accepted D50778: [LV] Vectorize loops where non-phi instructions used outside loop.
Aug 20 2018, 1:16 PM

Aug 19 2018

Ayal added inline comments to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.
Aug 19 2018, 1:02 AM

Aug 15 2018

Ayal added a comment to D50778: [LV] Vectorize loops where non-phi instructions used outside loop.

Suggest to update InnerLoopVectorizer::fixLCSSAPHIs() as follows, now that arbitrary values are allowed to be live-out:

unsigned LastLane = Cost->isUniformAfterVectorization(IncomingValue, VF) ? 0 : VF - 1;
Value *lastIncomingValue =
    getOrCreateScalarValue(IncomingValue, {UF - 1, LastLane});
Aug 15 2018, 3:15 PM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

I have a general question about direction, not specific to this patch.

It seems like we're adding a specific form of predication to the vectorizer in this patch and I know we already have support for various predicated load and store idioms. What are our plans in terms of supporting more general predication? For instance, I don't believe we handle loops like the following at the moment:

for (int i = 0; i < N; i++) {
 if (unlikely(i > M)) 
    break;
 sum += a[i];
}

Can the infrastructure in this patch be generalized to handle such cases? And if so, are their any specific plans to do so?

Aug 15 2018, 12:54 PM

Aug 14 2018

Ayal added a comment to D50665: [LV][LAA] Vectorize loop invariant values stored into loop invariant address.

The decision how to vectorize invariant stores also deserves attention: LoopVectorizationCostModel::setCostBasedWideningDecision() considers loads from uniform addresses, but not invariant stores - these may end up being scalarized or becoming a scatter; the former is preferred in this case, as the identical scalarized replicas can later be removed. In any case associated cost estimates should be provided to support overall vectorization costs. Note that vectorizing conditional invariant stores deserves special attention. Unconditional invariant stores are candidates to be sunk out of the loop, preferably before trying to vectorize it. One approach to vectorize a conditional invariant store is to check if its mask is all false, and if not to perform a single invariant scalar store, for lack of a masked-scalar-store instruction. May be worth distinguishing between uniform and divergent conditions; this check is easier to carry out in the former case.

Aug 14 2018, 1:53 PM
Ayal accepted D50579: [LV] Teach about non header phis that have uses outside the loop.
Aug 14 2018, 7:52 AM
Ayal added a comment to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.

Do you see any potential issue that could make modeling this in the VPlan native path complicated once we have predication?

Aug 14 2018, 12:25 AM

Aug 13 2018

Ayal added a comment to D50579: [LV] Teach about non header phis that have uses outside the loop.

Overall looks good to me, though it could be cleaned up a bit more?

Aug 13 2018, 2:12 PM

Aug 12 2018

Ayal added inline comments to D50579: [LV] Teach about non header phis that have uses outside the loop.
Aug 12 2018, 3:03 PM

Aug 11 2018

Ayal added inline comments to D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 11 2018, 2:06 PM
Ayal added inline comments to D50474: [LV] Vectorize header phis that feed from if-convertable latch phis.
Aug 11 2018, 2:11 AM

Aug 9 2018

Ayal added inline comments to D50474: [LV] Vectorize header phis that feed from if-convertable latch phis.
Aug 9 2018, 3:27 PM

Aug 8 2018

Ayal created D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size.
Aug 8 2018, 3:11 PM

Jun 14 2018

Ayal added inline comments to D48048: [LV] Prevent LV to run cost model twice for VF=2.
Jun 14 2018, 3:16 PM

Jun 12 2018

Ayal added inline comments to D48048: [LV] Prevent LV to run cost model twice for VF=2.
Jun 12 2018, 1:02 PM

May 1 2018

Ayal added inline comments to D46126: [SLP] Vectorize transposable binary operand bundles.
May 1 2018, 12:31 PM
Ayal added inline comments to D46126: [SLP] Vectorize transposable binary operand bundles.
May 1 2018, 8:46 AM

Apr 30 2018

Ayal added a comment to D46126: [SLP] Vectorize transposable binary operand bundles.

This is reminiscent of LV's interleave group optimization, in the sense that a couple of correlated inefficient vector "gathers" are replaced by a couple of efficiently formed vectors followed by transposing shuffles. The correlated gathers may come from the two operands of a binary operation, as in this patch, or more generally from arbitrary leaves of the SLP tree.

Apr 30 2018, 8:48 AM

Apr 1 2018

Ayal accepted D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

Looks good to me, thanks for addressing the issues, have only a few last minor suggestions.

Apr 1 2018, 12:31 AM

Mar 23 2018

Ayal added a comment to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

Have test(s) for extractvalue's, for completeness.
Make sure tests cover best-order selection: cases where original order is just as frequent as other orders (tie-break), less frequent, more frequent.

Mar 23 2018, 4:58 PM

Mar 18 2018

Ayal added a comment to D44523: Change calculation of MaxVectorSize.

See MaximizeBandwidth, as in
llvm-dev's Enable vectorizer-maximize-bandwidth by default?
patch: Enable vectorizer-maximize-bandwidth by default.
which is still reverted afaik: r306936 - Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."

Mar 18 2018, 1:52 PM

Mar 9 2018

Ayal added inline comments to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..
Mar 9 2018, 2:44 PM

Mar 7 2018

Ayal added a comment to D43776: [SLP] Fix PR36481: vectorize reassociated instructions..

This patch addresses the following TODO, plus handles extracts:

Mar 7 2018, 11:39 PM

Feb 27 2018

Ayal resigned from D43812: [LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast..
Feb 27 2018, 12:01 PM

Feb 21 2018

Ayal resigned from D43536: [LV] Fix for PR36311, vectorizer's isUniform() abuse triggers assert in SCEV.
Feb 21 2018, 3:26 AM

Feb 10 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Hi Ayal, Sanjoy,

The last update's review was pending for long. Off late, SLP has lots of changes so I will have to rebase but before rebasing please see if any more changes required in its current form.

Thanks in advance.

Feb 10 2018, 11:48 PM

Feb 1 2018

Ayal added inline comments to D42123: Derive GEP index type from Data Layout.
Feb 1 2018, 2:49 AM

Jan 15 2018

Ayal added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Jan 15 2018, 11:49 PM

Jan 11 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#971181, @Ayal wrote:

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Test case, test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll, already included.

Jan 11 2018, 1:15 PM

Jan 9 2018

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Jan 9 2018, 8:40 AM

Dec 29 2017

Ayal added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 29 2017, 7:31 AM

Dec 21 2017

Ayal added inline comments to D36130: [SLP] Vectorize jumbled memory loads..
Dec 21 2017, 3:25 AM

Dec 19 2017

Ayal added a comment to D41324: [SLPVectorizer] Add shuffle instruction cost for jumbled load.

Presumably this fixes the reported regressions?

Dec 19 2017, 2:44 AM

Dec 14 2017

Ayal accepted D38948: [LV] Support efficient vectorization of an induction with redundant casts.

Just to formally close this review, as it wasn't closed by the commit.

Dec 14 2017, 11:33 AM

Dec 11 2017

Ayal accepted D36130: [SLP] Vectorize jumbled memory loads..

This looks good to me, with a couple of last minor fixes.

Dec 11 2017, 2:51 PM

Dec 10 2017

Ayal added a comment to D38948: [LV] Support efficient vectorization of an induction with redundant casts.

This looks good to me, but please wait for Silviu to approve as well before committing.

Dec 10 2017, 10:25 AM

Dec 9 2017

Ayal accepted D40883: [LV] Ignore the cost of values that will not appear in the vectorized loop.

LGTM, thanks for taking this out of D38948 and into a separate commit.

Dec 9 2017, 1:57 PM
Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#945728, @Ayal wrote:

Good catch. Add a LIT test?

It was asserting in few of LNT Multisource bench mark. How to extract it for LIT test?

Dec 9 2017, 1:30 PM

Dec 7 2017

Ayal abandoned D28975: [LV] Introducing VPlan to model the vectorized code and drive its transformation.

Hi Ayal,

This functionality has been submitted already, right? If so, please close this review.

Thanks,
--renato

Dec 7 2017, 8:21 AM

Dec 5 2017

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Good catch. Add a LIT test?

Dec 5 2017, 2:41 PM

Nov 28 2017

Ayal added a comment to D39346: [LV] [ScalarEvolution] Fix PR34965 - Cache pointer stride information before LV code gen.

Nice catch! Continuing to use SCEV and expect consistent answers in the midst of restructuring the IR is indeed wrong; its cache is not meant to cover for cases that can no longer be analyzed as before.

Nov 28 2017, 3:27 AM

Nov 19 2017

Ayal added inline comments to D38948: [LV] Support efficient vectorization of an induction with redundant casts.
Nov 19 2017, 10:31 AM

Nov 16 2017

Ayal added a comment to D38948: [LV] Support efficient vectorization of an induction with redundant casts.

A few additional minor comments..

Nov 16 2017, 2:39 AM

Nov 2 2017

Ayal added a comment to D38785: [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a single-iteration loop.

In short, I think we are all in agreement that:

  1. This patch is a (small) improvement, * regardless of the users and their specific cost considerations *.
  2. The cost-model aspect of deciding when to specialize for a certain stride needs to be improved. The users (LV especially) are currently not making informed decisions.

    …Right?
Nov 2 2017, 2:47 PM
Ayal added inline comments to D38948: [LV] Support efficient vectorization of an induction with redundant casts.
Nov 2 2017, 9:33 AM

Nov 1 2017

Ayal added a comment to D38785: [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a single-iteration loop.

Indeed, a loop with an iteration count smaller than VF is definitely not worth vectorizing. An interesting profitability issue is to decide how many iterations past VF suffice to amortize vectorization overheads. In any case, this single/no iteration case looks like a no-brainer and realistic case - traversing a column of an NxN matrix.

Nov 1 2017, 5:47 PM

Oct 2 2017

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

The regression test result is as follows. There are 2 failures coming from Clang however these failures are also observed without this patch.

Oct 2 2017, 2:55 PM

Sep 29 2017

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..
Sep 29 2017, 12:47 PM

Sep 27 2017

Ayal created D38339: [LV] Fix PR34711 - handle widening of instruction ranges in the presence of sinking casts.
Sep 27 2017, 4:53 PM
Ayal created D38338: [LV] Fix PR34743 - handle casts that sink after interleaved loads.
Sep 27 2017, 4:24 PM

Sep 26 2017

Ayal added a comment to rL311849: [LV] Fix PR34248 - recommit D32871 after revert r311304.

Can you provide a reproducer? Best is to open a PR and continue this discussion there. See e.g. https://bugs.llvm.org/show_bug.cgi?id=34711, which also contains a suggested fix to be upstreamed, which might apply to your case as well.

Sep 26 2017, 3:42 PM

Sep 19 2017

Ayal accepted D36130: [SLP] Vectorize jumbled memory loads..

Agreed, revisiting the ReverseConsecutive/NumLoadsWantToChangeOrder/shouldReorder() logic in view of general shuffled loads deserves a separate patch.

Sep 19 2017, 10:19 AM

Sep 16 2017

Ayal added a comment to D35498: [LoopVectorizer] Use two step casting for float to pointer types..

This was closed due to committing r312331, right? Code LGTM, for the record. Tests for interleaved loads of float/pointer should still be added, as this patch presumably handles them too.

Sep 16 2017, 6:17 AM

Sep 13 2017

Ayal added inline comments to D32729: LV: Don't vectorize with unknown loop counts on divergent targets.
Sep 13 2017, 1:46 PM

Sep 12 2017

Ayal accepted D37507: Fix maximum legal VF calculation.
Sep 12 2017, 9:05 AM
Ayal accepted D37702: [LV] Clamp the VF to the trip count.

@Ayal, any other comments or does this look good to go? Thanks.

Sep 12 2017, 7:51 AM
Ayal added inline comments to D37702: [LV] Clamp the VF to the trip count.
Sep 12 2017, 6:40 AM

Sep 11 2017

Ayal added inline comments to D37702: [LV] Clamp the VF to the trip count.
Sep 11 2017, 2:05 PM
Ayal added a reviewer for D37507: Fix maximum legal VF calculation: mkuper.
Sep 11 2017, 12:29 AM
Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..
In D36130#863237, @Ayal wrote:

Yes, thanks, the example is clear. I agree that having OpdNums allows to represent such cases where two shuffles from the same load are to feed two distinct operands of the same user. But the SLP vectorizer with this patch alone will not optimize such patterns, right? Take for example jumbled-load.ll above; to match the case depicted in the pdf, replace the zeroes used as the 2nd operands of the cmp's and/or of the select's by a shuffle (same as that of the 1st operand or different).

Yes, that's right. However right now I am not clear what needs to be done to optimize those pattern. I also tried another simple case, where 2 different Shuffle from same LOAD is fed into
a MUL. Here, for VF=4, SLP reports "Scalar used twice in bundle" and removes redundant scalar operations instead of vectorization. Consequently, the STOREs of the results of the MULs does not get vectorized. May be we need to do a trade-off between scalar vs vector code here.

Sep 11 2017, 12:28 AM
Ayal added a comment to D37507: Fix maximum legal VF calculation.

Only additional comment is whether the test needs to be X86/skx specific, or whether it can be placed in, say, memdep.ll

Sep 11 2017, 12:17 AM

Sep 10 2017

Ayal added inline comments to D37425: LoopVectorize: MaxVF should not be larger than the loop trip count.
Sep 10 2017, 1:24 AM
Ayal added inline comments to D37507: Fix maximum legal VF calculation.
Sep 10 2017, 12:40 AM

Sep 8 2017

Ayal created D37619: [LV] Fix PR34523 - avoid generating redundant selects.
Sep 8 2017, 3:30 AM

Sep 7 2017

Ayal added a comment to D36130: [SLP] Vectorize jumbled memory loads..

Sep 7 2017, 4:45 AM