This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Consider interleaving when deciding if epilogue vectorisation is beneficial
Needs ReviewPublic

Authored by kmclaughlin on May 18 2023, 8:12 AM.

Details

Summary

Changes isEpilogueVectorizationProfitable to multiply the VF of the original loop with the
interleave count and compare this to EpilogueVectorizationMinVF.

When interleaving we reduce the likelihood of doing any vector work by increasing the
minimum trip count required to enter the loop, in the same way as choosing a high VF.
This change allows epilogues to be considered for loops with interleaving more equally,
for example VF=8, IC=2 will be considered in the same way that VF=16, IC=1 will.

I ran SPEC2017 on neoverse-v1 with this change and saw no overall difference in the geometric
mean, and no differences in the performance of individual benchmarks outside of noise.
This change however results in about a 6% performance improvement for x264 on publicly
available SVE2 hardware.

Diff Detail

Event Timeline

kmclaughlin created this revision.May 18 2023, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 8:12 AM
kmclaughlin requested review of this revision.May 18 2023, 8:12 AM
fhahn added a subscriber: fhahn.May 22 2023, 12:43 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll
76

I think here it would probably be better to use VF=4 for the epilogue loop to use the full vector width. I think the existing logic to pick the epilogue vectorization factor picks the next lowest VF, which probably needs adjusting as well.

In addition to that, it would be good to verify the impact with some microbenchmarks (could be added here https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)

Matt added a subscriber: Matt.Aug 18 2023, 10:09 PM