This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Try to vectorize in the range from MaxVecRegSize to MinVecRegSize
Needs ReviewPublic

Authored by JongwonLee on Mar 16 2016, 10:06 PM.

Details

Summary

The previous SLP-vectorizer tried to vectorize for only one kind of vector register size, MinVecRegSize.
With this patch, we can try to vectorize in the range from MaxVecRegSize to MinVecRegSize.

Diff Detail

Event Timeline

JongwonLee retitled this revision from to [SLPVectorizer] Change MinVecRegSize from 128 bits to 64 bits.
JongwonLee updated this object.
JongwonLee added subscribers: llvm-commits, flyingforyou.
mcrosier added a subscriber: mcrosier.
mcrosier added inline comments.Mar 17 2016, 6:30 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3427

I'm thinking this should be a TTI hook, so each target can define the MinVecRegSize.

3723

Maybe use VecRegSize, rather than Size here?

3747

I believe you've addressed this fix me, correct?

3987

Same. Remove FIXME.

4342

Maybe use VecRegSize, rather than Size here?

4419

I assume this should be deleted, rather than commented out.

4489

I don't think you need the temporary variable.

for () {
  if (tryToVectorizeList()) {
    Changed = true;
    it = BB->begin();
    e = BB->end();
    break;
  }
}
4586

Please remove.

4587

I don't think you need this temporary value. You can just do something like the below, correct?

for (...)
  if (tryToVectorizeList()) {
    Changed = true;
    break;
  }
mzolotukhin edited edge metadata.Mar 17 2016, 12:26 PM

Hi Jongwon,

Thanks for working on this, I think that's a natural improvement over what we have now. I agree with Chad's remarks and have some more comments.

The patch now includes two semantical parts:

  • Change default MinVecRegSize value for AArch64.
  • Actually use MinVecRegSize in tryToVectorizeList and friends.

They look to me completely separated (and, in the end, needed), so I'd suggest tackling them in separate patches.

For the first part - changing the default value - I totally agree with Chad that it should be done via a target hook. You could take a look at how MaxVecRegSize is implemented. In general, SLPVectorizer.cpp shouldn't contain any target-specific code, and all target-dependent info should be requested via hooks (i.e. having a variable IsAArch64 might be a bad idea).
Also, for this kind of change it would be nice to show the impact on compilation and execution time. LLVM test-suite is good for this, results on other popular testsuites (e.g. SPEC) are also valueable.

The second part is good in general, one possible improvement you could do (if you have time for this) is to refactor vectorizeStoreChain and tryToVectorizeList functions. They'll be very similar when tryToVectorizeList is parametrized with a size too. That potentially should remove a lot of duplication in the code, which is always good.

Also, you'll need new tests for the second part, currently I assume the test you have is for the first part.

Thanks,
Michael

JongwonLee marked 7 inline comments as done.Mar 21 2016, 12:32 AM
JongwonLee added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
3427

I'll separate this from the current patch. The current patch will only handle the range of the size of vectorizable registers.

3747

Yes. The comment are removed.

JongwonLee retitled this revision from [SLPVectorizer] Change MinVecRegSize from 128 bits to 64 bits to [SLPVectorizer] Try to vectorize in the range from MaxVecRegSize to MinVecRegSize.
JongwonLee updated this object.
JongwonLee edited edge metadata.
mcrosier added inline comments.Mar 21 2016, 8:51 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3714

Why not just return true here and remove the unnecessary temp variable?

3718

return false (assuming you do the suggestion above).

4327

Shouldn't the call to tryToVectorizeList() still be predicated on NumElts > 1?

JongwonLee marked 2 inline comments as done.Mar 21 2016, 10:01 PM
JongwonLee added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
4327

Fixed the code to call tryToVectorizeList() when NumElts > 1 is satisfied.

I should have suggested this earlier (sorry), but do you think it make sense to push the loops into tryToVectorizeList and canMatchHorizontalReduction? I think it does; it avoids a lot of duplicated code at each call site.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4338

No that this logic is cleaned up you don't need the temporary bool.

for (unsigned VecRegSize = MaxVecRegSize; VecRegSize >= MinVecRegSize; VecRegSize /= 2) {
  if (tryToVectorizeList(makeArrayRef(IncIt, NumElts), R, None, false, VecRegSize)) {
    // Success start over because instructions might have been changed.
    HaveVectorizedPhiNodes = true;
    Changed = true;
    break;
  }
}

do you think it make sense to push the loops into tryToVectorizeList and canMatchHorizontalReduction?

Actually, it might make sense to keep it as is, the reason being the current form is closer to what we have in vectorizeStoreChain. Eventually, it would be cool to merge these two functions together to reduce code duplication. What do you think?

Michael

do you think it make sense to push the loops into tryToVectorizeList and canMatchHorizontalReduction?

Actually, it might make sense to keep it as is, the reason being the current form is closer to what we have in vectorizeStoreChain. Eventually, it would be cool to merge these two functions together to reduce code duplication. What do you think?

That's fine by me, Michael.

JongwonLee added inline comments.Mar 23 2016, 1:01 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4338

Removed the temporary bool.

JongwonLee updated this revision to Diff 51388.Mar 23 2016, 1:03 AM

Removed the function 'vectorizeStoreChain' and modified the function 'tryToVectroizeList'.

Hi Jongwon,

Thanks for the work, please find some comments below.

Michael

lib/Transforms/Vectorize/SLPVectorizer.cpp
3745

Nitpick: I'd rather swap if and else blocks to avoid negation in the condition.

3754–3755

I think this check should be combined with the one below:

if (!isPowerOf2_32(OpsWidth) || OpsWidth < 2)
  break;

and it should be done independently on vectorizeStoreChain flag.

4596

SLPCostThreshold disappeared after this change. Was it intentional?

mssimpso added inline comments.Mar 23 2016, 1:13 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4596

SLPCostThreshold is a command line option, so it doesn't need to be passed as a function parameter.

mzolotukhin added inline comments.Mar 23 2016, 2:29 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4596

Ah, right, thanks for pointing that out!

JongwonLee marked 2 inline comments as done.Mar 23 2016, 6:57 PM
JongwonLee added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
4596

Yes. mssimpso is right.

JongwonLee updated this revision to Diff 51505.Mar 23 2016, 6:59 PM

Hi Jongwon,

I've inlined a few comments. And I agree with Michael about the compile-time experiments. It would be helpful to see what kind of impact trying different register sizes will have. Thanks!

lib/Transforms/Vectorize/SLPVectorizer.cpp
3501

Unless I missed something, it looks to me like every use of tryToVectorizeList passes VecRegSize. Why make the parameter optional?

4269

I think it would be less confusing and more consistent if MinRegSize was renamed to VecRegSize here.

This comment was removed by JongwonLee.

This patch doesn't change the default value of MinVecRegSize, so it does not affect the performance.
(On top of this patch, another patch(http://reviews.llvm.org/D19151) is needed to change the default value of MinVecRegSize.)
In commercial benchmark, I found that clang lost the chance of 64-bit SLP vectorization in AArch64.
Although there would be side effect of extension of 64-bit SLP vectorization, the compiler should handle 64-bit SLP vectorization in AArch64. The side effect, if any, should be handled with another patch.

I tried to upload the data about compile-time and execution-time, but .csv file is not attached in the comment box. Please guide me how to upload .csv file if possible.

Hey Jongwon,

I think it's common to just paste in the most significant execution and compile time improvements and regressions. It's not necessary to report every data point.

Even though you're not changing the value of MinVecRegSize here, I think it makes sense to evaluate this patch and D19151 together so we can see the effect (particularly in compile time) that looping over potential register sizes will have.

In SPEC benchmark, compile-time increases by 2.8%. In LLVM test-suite, compile-time decreases by 9 %. The data are measured from the average of three times executions.