This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Vectorize jumbled memory loads.
AcceptedPublic

Authored by ashahid on Aug 1 2017, 12:31 AM.

Details

Summary

This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Patch update for fixing build bot failure:

This fix makes the place holder for Shuffle Mask from fixed array of 3 element
to an std::map. This need arises from the fact that a PHI node can have
any number of operand as incoming value.

Test performed:
LLVM lit test, 3 stage bootstrap build and LNT (Thanks to Hans and Daniel)

hans added a comment.Dec 5 2017, 10:18 AM

Patch update for fixing build bot failure:

I haven't looked at the patch at all, but I just tried it on a local Chrome build on Linux, and it seems to work for that.

Ayal added a comment.Dec 5 2017, 2:40 PM

Good catch. Add a LIT test?

lib/Transforms/Vectorize/SLPVectorizer.cpp
743

The fixed array

SmallVector<unsigned, 4> ShuffleMask[3];

of the previous version indeed cannot account for all operands. How about holding a

SmallVector<SmallVector<unsigned, 4>, 2> ShuffleMask;

instead of holding a map from 0,1,2,..,numOperands ?

771

Are both conditions really needed, or suffice say to check for -1 and assert positive indices are not too large?

3065

May be simpler to check instead ShuffleMask.count(OpdNum)

3093

clang-format

In D36130#945306, @hans wrote:

Patch update for fixing build bot failure:

I haven't looked at the patch at all, but I just tried it on a local Chrome build on Linux, and it seems to work for that.

Thanks Hans for triage.

In D36130#945728, @Ayal wrote:

Good catch. Add a LIT test?

It was asserting in few of LNT Multisource bench mark. How to extract it for LIT test?

lib/Transforms/Vectorize/SLPVectorizer.cpp
743

I think this can be done. I will try.

771

Sure I will check. I am thinking 30000 as large indices threshold, do you have any number in mind?

3065

Quite right.

ashahid added inline comments.Dec 8 2017, 8:12 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
771

I tried but seems both conditions are needed as I am getting assertion "Idx < size()" for SmallVector<<SmallVector, 4> 2> ShuffleMask.

Updated the review comments.

Minor commented code clean up done.

Ayal added a comment.Dec 9 2017, 1:30 PM
In D36130#945728, @Ayal wrote:

Good catch. Add a LIT test?

It was asserting in few of LNT Multisource bench mark. How to extract it for LIT test?

Suffice to have a phi with 4 predecessors, where (at-least) the 4th needs a shuffle-mask.

lib/Transforms/Vectorize/SLPVectorizer.cpp
771

UserTreeIdx is the index of the User entry as we build the tree bottom-up, so it should always be between 0 and VectorizableTree.size()-1, except for -1 when creating the new entry for the root, which is User-less. So it should suffice to check if Idx is -1, and otherwise assert that Idx < size(), if desired, right?

772–773

Code below still uses emplace_back contrary to the discussion above.

May need to call UserTreeEntry->ShuffleMask.resize() if OpdNum is larger than its initial/current size, before setting UserTreeEntry->ShuffleMask[OpdNum] = tempMask. (Otherwise the original "LNT Multisource bench mark" asserts should trigger again?)

Suggest to add a test where the first operand does not need a shuffle but the second one does.

2817

See above discussion about replacing second condition with an assert.

3058–3069

ditto

test/Transforms/SLPVectorizer/X86/crash_cmpop.ll
1 ↗(On Diff #126266)

Why add -debug?

Review comments updated and added lit tests.

ashahid marked 3 inline comments as done.

Review comments updated and added lit tests.

ashahid added inline comments.Dec 11 2017, 8:41 AM
test/Transforms/SLPVectorizer/X86/crash_cmpop.ll
1 ↗(On Diff #126266)

My bad, not intended.

Ayal accepted this revision.Dec 11 2017, 2:51 PM

This looks good to me, with a couple of last minor fixes.

Hope it stays in this time...

lib/Transforms/Vectorize/SLPVectorizer.cpp
778

alrea[d]y

3098

Can simply do for (unsigned Entry : ShuffleMask[OpdNum]) instead of iterating explicitly over all lanes and retrieving each UserTreeEntry->ShuffleMask[OpdNum][Lane].

test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
32

Suggested to also have a test where the 2nd operand is a shuffle but the 1st one isn't, which will fail if shuffles are added using emplace_back().

ashahid marked 3 inline comments as done.

Updated test and review comment.

Bootstrap and LNT test underway.

ashahid closed this revision.Dec 12 2017, 7:09 PM
eastig added a subscriber: eastig.Dec 14 2017, 5:32 AM

Hi Shahid,

These changes caused 27.7% and 30.2% regressions on an AArch64 Juno board (http://lnt.llvm.org/db_default/v4/nts/83681):

MultiSource/Benchmarks/mediabench/gsm/toast/toast: 30.20%
MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm: 27.73%

We have the same benchmarks regressed on our AArch64 boards (Cortex-A53, Cortex-A57).

-Evgeny Astigeevich
The ARM Compiler Optimisation team

Hi Shahid,

These changes caused 27.7% and 30.2% regressions on an AArch64 Juno board (http://lnt.llvm.org/db_default/v4/nts/83681):

MultiSource/Benchmarks/mediabench/gsm/toast/toast: 30.20%
MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm: 27.73%

We have the same benchmarks regressed on our AArch64 boards (Cortex-A53, Cortex-A57).

-Evgeny Astigeevich
The ARM Compiler Optimisation team

A problem report: https://bugs.llvm.org/show_bug.cgi?id=35673

sanjoy added a subscriber: sanjoy.Dec 19 2017, 4:03 PM
sanjoy added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1125

This should be a cast<>.

1153

LLVM style is to avoid using curly braces on single like for loops. Using std::iota would be even better.

lib/Transforms/Vectorize/SLPVectorizer.cpp
779

I think you should be able to do:

auto &OperandMask = UserTreeEntry->ShuffleMask[OpdNum];
assert(OperandMask.empty());
OperandMask.insert(OperandMask.end(), ShuffleMask.begin(), ShuffleMask.end());
1668

Not sure why you need NewVL here -- doesn't just using Sorted work?

3064

Might be cleaner to abstract (unsigned)OpdNum < UserTreeEntry->ShuffleMask.size() && !UserTreeEntry->ShuffleMask[OpdNum].empty() into a UserTreeEntry->hasShuffleMaskForOp(Index) helper.

3099

The cast to Value * should not be necessary.

3327

dyn_cast<XXX>(f)->g() should never be necessary. Either the dyn_cast can return null in which case you should check for that, or it can't and you should use cast<>. Also the cast of Vec to Instruction seems unnecessary: ShuffleVectorInst is an Instruction.

Ayal added inline comments.Dec 21 2017, 3:25 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
779

While we're at it, this should move under the if (UserTreeIdx != -1) to avoid checking if &VectorizableTree[UserTreeIdx] is null, as commented in https://reviews.llvm.org/D41324#inline-361435

1677

Should probably also check here that UserTreeIdx is not -1, to avoid creating a mask for the root with no place to hang it, as @sanjoy observed.

ashahid added inline comments.Dec 22 2017, 6:20 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
779

If we check for if (UserTreeIdx != -1 && ShuffledLoad) before the call of newTreeEntry(), we can avoid "UserTreeIdx != -1" check completely inside newTreeEntry().

1677

Yes, I had planned to do exactly this.

ashahid reopened this revision.Dec 28 2017, 11:04 PM
ashahid marked 8 inline comments as done.
ashahid added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
3327

Here I am trying to ensure that the instructions are "ShuffleVectorInst" and "LoadInst" respectively.

Casting of Vec to Instruction, is to satisfy the membership of getOperand() which compiler otherwise report as error.

This revision is now accepted and ready to land.Dec 28 2017, 11:04 PM

Updates review comments.

Regression test and LNT passes, 3 stage bootstrap test underway.

Ayal added inline comments.Dec 29 2017, 7:31 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3327

Use isa instead of dyn_cast here:
if (Vec && dyn_cast<LoadInst>(cast<Instruction>(Vec)->getOperand(0))) {

or alternatively do something like:

Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");
if (ShuffleVectorInst *Shuffle = dyn_cast<ShuffleVectorInst>(Vec))
  if (LoadInst *Load = dyn_cast<LoadInst>(Shuffle->getOperand(0)))
    Vec = Load;

Updated Ayal's comment accordingly

ashahid marked an inline comment as done.Jan 1 2018, 8:01 AM
Ayal added a comment.Jan 9 2018, 8:40 AM

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

In D36130#971181, @Ayal wrote:

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Test case, test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll, already included.

Ayal added a comment.Jan 11 2018, 1:13 PM
In D36130#971181, @Ayal wrote:

This should fix the case observed by @sanjoy in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171218/511721.html; please also include a testcase.

Test case, test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll, already included.

Ah, right, sorry, missed it.

This looks good to me, with only minor comments about the testcase.

Please see that @sanjoy approves too, as this mostly addresses issues he raised.

test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
28

"SINK" is defined redundantly, as it is not used.

Could this be simplified by removing the float-to-int casts?

In general, it may suffice to check that there's no load of <4 x i32>, which would be jumbled. Checking that two of the lanes have been vectorized may be fragile, in case a modified cost model will decide it ain't worth it.

sanjoy accepted this revision.Jan 13 2018, 2:40 PM
sanjoy added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
1113

The indent looks off here; can you please run clang-format?

lib/Transforms/Vectorize/SLPVectorizer.cpp
723

Optional: you can write return X; instead of if (X) return true; return false;.

1675

Nit: s/usefull/useful/

3326–3331

I think you can rewrite this more cleanly using an immediately-invoked function expression:

Value *Vec = [&]() {
  if (auto *SVI = dyn_cast<ShuffleVectorInst>(E->VectorizedValue))
    if (auto *LI = dyn_cast<LoadInst>(SVI->getOperand(0)))
      return LI->getOperand(0);
  return E->VectorizedValue;
}();
ashahid marked an inline comment as not done.Jan 15 2018, 9:01 PM
ashahid added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
3326–3331

I tried this IIFE, however I am getting an assertion "Tried to create extractelement operation on non-vector type!" for jumbled-load-multiuse.ll test. Do you see any issue in this code?

sanjoy added inline comments.Jan 15 2018, 10:54 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3326–3331

Yes, I think I should have written:

Value *Vec = [&]() {
  if (auto *SVI = dyn_cast<ShuffleVectorInst>(E->VectorizedValue))
    if (isa<LoadInst>(SVI->getOperand(0)))
      return SVI->getOperand(0);
  return E->VectorizedValue;
}();
Ayal added inline comments.Jan 15 2018, 11:49 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3326–3331

Yes, this simplifies the below "alternatively do something like:"

Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");
if (ShuffleVectorInst *Shuffle = dyn_cast<ShuffleVectorInst>(Vec))
  if (LoadInst *Load = dyn_cast<LoadInst>(Shuffle->getOperand(0)))
    Vec = Load;

Updates test case and stylistic review comments

Hi Ayal, Sanjoy,

The last update's review was pending for long. Off late, SLP has lots of changes so I will have to rebase but before rebasing please see if any more changes required in its current form.

Thanks in advance.

Ayal added a comment.Feb 10 2018, 11:48 PM

Hi Ayal, Sanjoy,

The last update's review was pending for long. Off late, SLP has lots of changes so I will have to rebase but before rebasing please see if any more changes required in its current form.

Thanks in advance.

This looks good to me, as commented earlier, but please see that @sanjoy approves too, as this mostly addresses issues he raised.

sanjoy accepted this revision.Feb 11 2018, 12:18 PM

I don't have any more coding style comments. I've not reviewed the actual semantic changes.

lib/Analysis/LoopAccessAnalysis.cpp
1166

Can you use std::iota here?

ABataev added inline comments.Feb 12 2018, 8:04 AM
lib/Analysis/LoopAccessAnalysis.cpp
1112

This function can be used for stores also, it is better to make it universal for stores/loads.

1156

It is better to use stable_sort rather than sort

1169

stable_sort

lib/Transforms/Vectorize/SLPVectorizer.cpp
1663

Is it possible at all that VL has less than 4 elements here?

1668

i->I, e->E. Variables must have Camel-like names.

2231–2237

You don't need so many shuffles, it is enough just to have just one.

ABataev added inline comments.Feb 12 2018, 8:04 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
737–743

Why you can't have just one shuffle here for all external uses?

1679–1680

Bad decision. It is better to use original VL here, rather than Sorted and add an additional array of sorted indieces. In this case you don't need all these additional numbers and all that complex logic to find the correct tree entry for the list of values.

3326–3331

I think you can have default capture by value here rather than by reference.

Hi Alexey,

As I was trying to rebase this patch, it seems this overlaps with your "reverse load" patch. Could you take a look in this patch?

Hi Alexey,

Thanks for looking into it.I will update it accordingly.
BTW this patch is failing with its tests after the re-base on top of your patch. Do you foresee any conflicting code?

lib/Analysis/LoopAccessAnalysis.cpp
1112

I plan to do such improvement in separate patches.

lib/Transforms/Vectorize/SLPVectorizer.cpp
737–743

This is for in-tree multi uses of a single vector load where the uses has different masks/permutation.
This section of comment https://reviews.llvm.org/D36130#inline-326711
discussed it earlier. Also there is figure attached.

1663

I think yes, for example a couple of i64 loads considering minimum register width as 128-bit.

However, this check here was basically meant to indicate jumbled loads of size 2 is essentially a reversed load.

1679–1680

In fact earlier design in patch (https://reviews.llvm.org/D26905) was to use original VL, however there was counter argument to that which I don't remember exactly.

2231–2237

This is basically for multiple in-tree uses with different masks/permutation.

ABataev added a comment.EditedFeb 13 2018, 7:33 AM

Hi Alexey,

Thanks for looking into it.I will update it accordingly.
BTW this patch is failing with its tests after the re-base on top of your patch. Do you foresee any conflicting code?

Probably, it is hard to say exactly without looking at the result.

lib/Analysis/LoopAccessAnalysis.cpp
1112

I just suggest to make universal at the very beginning, that's it

lib/Transforms/Vectorize/SLPVectorizer.cpp
737–743

I still don't understand what's the problem here.

  1. You need to perform the loads in some order.
  2. You sort the loads to be in the sequntially direct order and perform the vector load starting from the lowest address.
  3. You reshuffle the loaded vector value to the original order.

That's it, you have your loads in the required order. Just one shuffle is required. Why do you need some more? Also, I don't understand why do you need so many changes, why do you need additional indicies etc.

1663

It is going to be handled by the reverse loads patch

1679–1680

It is better to use original VL here, otherwise it will end with a lot of troubles and will require the whole bunch of changes in the vectorization process to find the perfect match for the vector of vectorized values. I don't think it is a good idea to have a lot of changes accross the whole module to handle jumbled loads.

3065

Is this correct? E->Scalars[0] is exactly VL0

Updates review comments and a test case.

Minor clean up.

Hi Alexey,

Thanks for looking into it.I will update it accordingly.
BTW this patch is failing with its tests after the re-base on top of your patch. Do you foresee any conflicting code?

Hi Alexey,

Thanks for looking into it.I will update it accordingly.
BTW this patch is failing with its tests after the re-base on top of your patch. Do you foresee any conflicting code?

Probably, it is hard to say exactly without looking at the result.

No worry it was a merge issue, its fixed.

lib/Transforms/Vectorize/SLPVectorizer.cpp
737–743

Updated jumbled-load.ll captures this case where instead of gathering the second operand of MUL we can have required shuffle of the same loaded vector

1663

Yes, this check no more required.

1679–1680

In the context where we can have multiple user of loaded vector with different shuffle mask, the design is to represent these different shuffle mask for each user corresponding to the user's operand number. Having single sorted indices will not be sufficient for this.
Given the objective of handling multiple out of order uses changes are not that big I feel.

3065

Ah, both are same.

ABataev added inline comments.Feb 14 2018, 6:50 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1679–1680

Now I see what do you want to do. But I don't think that this the correct way to implement it. It complicates the whole vectorization process. I'd suggest to create different tree entries for each particular order of the loads and exclude loads from the check that the same instruction is used several times in different tree entries.
If you worry about several different loads of the same values, I think they will be optimized by instruction combiner.

ashahid added inline comments.Feb 16 2018, 9:46 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1679–1680

Off course this could have been a better solution but I was not sure of the impact it may have by breaking the single tree entry assumption. One problem I see is the TreeEntry lookup if multiple node with same scalar values are present.
I can use isSame() check to make sure correct tree entry is found, however it may become costly in case of PHI instruction fed by same vector Load.

ABataev added inline comments.Feb 16 2018, 10:29 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1679–1680

I think it is better to start with handling of single tree entry rather than trying to handle all possible situations in a single patch. I suggest to split this patch into 2 parts at least: 1. handling of tree entry with jumbled loads. 2. further improvements.

test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
8–11

These checks are not autogenerated, fix it. Moreover, it is recommended to commit these tests separately with the checks for the original version of the compiler and the update checks with the fixed version to demonstrate improvements.

Updated the patch to accomodate the review comments.

As suggested, now the reordering mask will be part of each tree entry. Also this update does not consider to optimize the reordered load for multiple operand for now.

By the way, take a look at my D43776 that does the same but in more general way

lib/Transforms/Vectorize/SLPVectorizer.cpp
1662

Why you can do this only if ReuseShuffleIndicies.empty()?

1667–1672

It is enough just to compare VL and Sorted. If they are the same, the loads are not shuffled

1675

Why you can't do to add vectorized tree entry if UserTreeIdx == -1?

1678

Each true or false argument must have to prepend comment with the name of the function parameter, related to this argument

2231

You can remove the last argument here

2816

Why do you need this condition?

3058–3069

Restore the original code here

3091

Remove this empty line

3326–3332

I rather doubt you need all that stuff. You can use original code

test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll
2

You need to add this test separately and show changes in it.

test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll
2

You need to add this test separately and show changes in it

test/Transforms/SLPVectorizer/X86/jumbled-load-used-in-phi.ll
2

You need to add this test separately and show changes in it

test/Transforms/SLPVectorizer/X86/jumbled-load.ll
64

You need to add this test separately and show changes in it

Will commit the tests as NFC.

Seems like I am not getting the mails from phabricator, what shall I do to get the mails?

Checked the patch D43776, seems it will make this patch redundant.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1662

This is to avoid the overlapping the UniqueValues reuse logic of your changes.

1667–1672

Sure it is, but this avoids the compare. So I thought having a boolean is preferable.

1675

My bad, this is not required.

1678

Ok

2231

Sure

2816

In the 2nd test of jumbled-load.ll the two operands of MUL is fed from the same loaded vector. The 1st operand is SHUFFLE of LOAD and the 2nd operand is the gather of the same scalar loads.
Query to getTreeEntry() will always return the node with the same vectorized value and hence both the operand of MUL will be fed the shuffled load.
This check is to avoid this scenario.

3058–3069

Thanks

3326–3332

This is required otherwise *multiuse*.ll test as well as PR32086.ll will fail because the lanes were recorded according to the order of scalar loads.

ashahid marked 9 inline comments as done.

Updated further review comments.

Hope this is fine.

ABataev added inline comments.Feb 28 2018, 9:49 AM
lib/Analysis/LoopAccessAnalysis.cpp
1112

What about this comment? Do you really need Sorted argument?

1125

PointerType *->auto *

1129–1131

I think there must be an assertion instead of this check.

1141

const SCEVConstant *->const auto *

1146–1148

This check better to move to SLPVectorizer.cpp, because the function can be used for masked load/store.

1161

for (unsigned I = 0, E = VL.size(); I < E; ++I)

1166

Actually Mask is a full copy of UseOrder, you don't need all that complex stuff here

lib/Transforms/Vectorize/SLPVectorizer.cpp
1662

Why you can't handle it? What's the problem?

1667–1672

Why do we need the compare?

2816

This scenario should happen in your patch, the instruction either vectorized, or gathered, but not both.

3326–3332

Again, it just may not happen in this patch

ABataev added inline comments.Feb 28 2018, 11:07 AM
lib/Analysis/LoopAccessAnalysis.cpp
1166

Oops, no, Mask is not a copy of UseOrder
But you can create it much simpler:

for (unsigned I = 0, E = VL.size(); I < E; ++I)
  Mask[UseOrder[I]] = I;
sanjoy removed a reviewer: sanjoy.Feb 28 2018, 11:34 AM
sanjoy removed a subscriber: sanjoy.
ashahid added inline comments.Feb 28 2018, 11:30 PM
lib/Analysis/LoopAccessAnalysis.cpp
1112

Yes, otherwise my test fails. Seems it breaks some assumption.

1166

Thanks

lib/Transforms/Vectorize/SLPVectorizer.cpp
1662

It was a thought,I have not checked yet. I will check.

1667–1672

I meant, if we dont use ShuffledLoad flag we have to compare VL vs Sorted instead.

2816

This check is to avoid feeding the generated SHUFFLE to both operand of MUL which is not the intention of the test case.

3326–3332

It does happen and this test fails.

ABataev added inline comments.Mar 2 2018, 10:59 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

No, use original VL here, do not use Sorted. In this case you won't need an additional argument in sortLoadAccesses and you don't need all that complex stuff with the lambda on line 3528

ashahid added inline comments.Mar 5 2018, 10:39 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

If I am not wrong, for LOADs, VL0 must be the 1st element of the buffer whose base address will be used for vector load.
So using VL will break this assumption.

ABataev added inline comments.Mar 6 2018, 6:18 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

Why? And why you can't choose the right VL0 during vectorization?

ashahid added inline comments.Mar 6 2018, 8:20 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

For example, if we have two arrays A[4] and B[1] laying one after another in memory and the selected VF is 4 for the scalar loads of A[1], A[2], A[0], A[3] in order of use, the generated vector load will load the elements A[1], A[2], A[3], B[1] which is not desired.

Of-course we can choose the right VL0 during vectorization but we have to compute it again here using the mask which can be avoided if we use Sorted VL.

If I am missing something?

ABataev added inline comments.Mar 6 2018, 8:42 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

You already store the mask in the tree entry and you can choose the right VL0 using this mask. Using Sorted VL complicates the whole vectorization process and, thus, adds some extra points for the incorrect vectorization. That's why I insist to use original VL and choose the correct VL0 during codegen.

ashahid added inline comments.Mar 6 2018, 9:08 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1678

Got it. Since you already have these improvements in this patch https://reviews.llvm.org/D43776 , I think it is better to get that through.

RKSimon added a subscriber: RKSimon.Apr 8 2018, 2:43 AM

@ashahid What's happening to this patch?

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:02 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon reopened this revision.Oct 7 2019, 6:08 AM
This revision is now accepted and ready to land.Oct 7 2019, 6:08 AM