- User Since
- Jul 12 2015, 1:48 PM (157 w, 3 d)
Jun 14 2018
Jun 12 2018
May 1 2018
Apr 30 2018
This is reminiscent of LV's interleave group optimization, in the sense that a couple of correlated inefficient vector "gathers" are replaced by a couple of efficiently formed vectors followed by transposing shuffles. The correlated gathers may come from the two operands of a binary operation, as in this patch, or more generally from arbitrary leaves of the SLP tree.
Apr 1 2018
Looks good to me, thanks for addressing the issues, have only a few last minor suggestions.
Mar 23 2018
Have test(s) for extractvalue's, for completeness.
Make sure tests cover best-order selection: cases where original order is just as frequent as other orders (tie-break), less frequent, more frequent.
Mar 18 2018
See MaximizeBandwidth, as in
llvm-dev's Enable vectorizer-maximize-bandwidth by default?
patch: Enable vectorizer-maximize-bandwidth by default.
which is still reverted afaik: r306936 - Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
Mar 9 2018
Mar 7 2018
This patch addresses the following TODO, plus handles extracts:
Feb 27 2018
Feb 21 2018
Feb 10 2018
Feb 1 2018
Jan 15 2018
Jan 11 2018
Jan 9 2018
This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.
Dec 29 2017
Dec 21 2017
Dec 19 2017
Presumably this fixes the reported regressions?
Dec 14 2017
Just to formally close this review, as it wasn't closed by the commit.
Dec 11 2017
This looks good to me, with a couple of last minor fixes.
Dec 10 2017
This looks good to me, but please wait for Silviu to approve as well before committing.
Dec 9 2017
LGTM, thanks for taking this out of D38948 and into a separate commit.
Dec 7 2017
Dec 5 2017
Good catch. Add a LIT test?
Nov 28 2017
Nice catch! Continuing to use SCEV and expect consistent answers in the midst of restructuring the IR is indeed wrong; its cache is not meant to cover for cases that can no longer be analyzed as before.
Nov 19 2017
Nov 16 2017
A few additional minor comments..
Nov 2 2017
Nov 1 2017
Indeed, a loop with an iteration count smaller than VF is definitely not worth vectorizing. An interesting profitability issue is to decide how many iterations past VF suffice to amortize vectorization overheads. In any case, this single/no iteration case looks like a no-brainer and realistic case - traversing a column of an NxN matrix.
Oct 2 2017
Sep 29 2017
Sep 27 2017
Sep 26 2017
Can you provide a reproducer? Best is to open a PR and continue this discussion there. See e.g. https://bugs.llvm.org/show_bug.cgi?id=34711, which also contains a suggested fix to be upstreamed, which might apply to your case as well.
Sep 19 2017
Agreed, revisiting the ReverseConsecutive/NumLoadsWantToChangeOrder/shouldReorder() logic in view of general shuffled loads deserves a separate patch.
Sep 16 2017
This was closed due to committing r312331, right? Code LGTM, for the record. Tests for interleaved loads of float/pointer should still be added, as this patch presumably handles them too.
Sep 13 2017
Sep 12 2017
@Ayal, any other comments or does this look good to go? Thanks.
Sep 11 2017
Only additional comment is whether the test needs to be X86/skx specific, or whether it can be placed in, say, memdep.ll
Sep 10 2017
Sep 8 2017
Sep 7 2017
Sep 4 2017
Sep 1 2017
Aug 31 2017
- Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it
Aug 30 2017
Aug 28 2017
Aug 23 2017
Fix PR34248: pack a predicated scalar into a vector only when vectorizing; avoid doing so when only unrolling. Add a test derived from the reproducer of PR34248.
Aug 22 2017
Aug 20 2017
@sbaranga, can you clarify my comments and help me understand this better? Hopefully this could move forward.
sortMemAccesses() is analogous to the formation of InterleaveGroups in the LoopVectorizer, which also scans a collection of Loads (or Stores) to determine if they are adjacent in some order and can be combined into one Vector Load of a given width; and if so, in what order. This requires a single scan to compute the distances relative to the first access, as done here. But knowing that we're looking for a permutation of a given width, we can more easily sort the accesses as they are entered into a map, holding the minimum and maximum indices. See insertMember() there.
Aug 19 2017
Previous upload missed newly added VPlan.h and VPlan.cpp, including them here. This is the version that was committed.
Aug 16 2017
Uploading the version updated to top of trunk before committing, including merging with SinkAfter patch D33058 by reordering ingredients before constructing recipes for them.
Aug 13 2017
Is it worth adding a test case for this? I'm not sure...
Aug 8 2017
Looks good to me, please wait for @mssimpso to approve.
Aug 7 2017
Aug 4 2017
Aug 2 2017
Jul 27 2017
Instead of refraining to vectorize a loop which has an externally used phi (or rather the bump thereof) and any predicate, can a predicate be added (or an existing one be extended) to also cover the last iteration? Pity to bail out on such corner cases.
Jul 21 2017
Jul 19 2017
Jul 18 2017
Jul 12 2017
Added the following comment following review, and updated a testcase that was recently modified (if-conversion-nest.ll)
Jul 11 2017
Jul 3 2017
Patch updated to llvm trunk, adapted to the new ValueMap interface of D34473. ValueMap is extracted to a standalone struct VectorizerValueMap.
Jun 28 2017
Updated version includes the comment requested by @hfinkel.
Jun 24 2017
Updated version addresses review comments.
Jun 21 2017
Yes, we saw a couple of ~7% improvements running eembc benchmarks on x86.
An alternative interface which may be simpler and clearer ... it may be better done in a separate patch.
Jun 19 2017
Jun 18 2017
Jun 14 2017
This has been conceptually approved, but is pending:
Jun 13 2017
Jun 12 2017
Updated following review comments.
Jun 7 2017
(Addressing review comments; To be completed)
Done. Will upload updated version shortly.