Page MenuHomePhabricator

[SLP] Initial support for the vectorization of the non-power-of-2 vectors.
Needs ReviewPublic

Authored by ABataev on Jan 22 2019, 8:30 AM.

Details

Summary

Possibly vectorized operations are extended to the power-of-2 number with UndefValues to allow to use regular vector operations.

Diff Detail

Event Timeline

ABataev created this revision.Jan 22 2019, 8:30 AM
ABataev updated this revision to Diff 192640.Mar 28 2019, 8:03 AM

Updated to latest version.

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2019, 8:03 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript

Some initial comments - there's a lot going on (cutting it into smaller patches would be very useful).

The sdiv undefs test worries me, and I'm not certain if we have good masked load/store costs at the moment.

lib/Transforms/Vectorize/SLPVectorizer.cpp
2148

Check for failure instead of just dereferencing?

2696

drop the * ?

2837

drop the * ?

2999–3000

Is it safe to use VL[0] like this - do we guarantee the undef types are correct?

3683–3684

This is going to cause problems for an incoming patch I'm working on - any chance that you can avoid using Instruction &I?

test/Transforms/SLPVectorizer/AArch64/PR38339.ll
16

regression?

test/Transforms/SLPVectorizer/X86/addsub.ll
319 ↗(On Diff #192640)

regression?

test/Transforms/SLPVectorizer/X86/alternate-fp.ll
87

regression?

test/Transforms/SLPVectorizer/X86/alternate-int.ll
611

This test is checking that the sdiv didn't get vectorized, because if any elt is undef then the whole result is undef - but if its stays scalarized then on elts 0 and 4 become undef.

Passing-by question: will this *only* consider non-power-of-two vectors, or all vectors that are smaller than the vector length?
I.e. is this something that will help with e.g. https://godbolt.org/z/h64TuT ? (from IRC)

Passing-by question: will this *only* consider non-power-of-two vectors, or all vectors that are smaller than the vector length?
I.e. is this something that will help with e.g. https://godbolt.org/z/h64TuT ? (from IRC)

It should handle all vector lengths, 6 ops must be vectorized as 8 elems vector with 2 undefs

@ABataev Are you still looking at this or should we move to D66416? Now that X86 has enabled integer widening, we could more easily make use of this.

ABataev updated this revision to Diff 215884.Mon, Aug 19, 6:56 AM

Updated tests

@ABataev Are you still looking at this or should we move to D66416? Now that X86 has enabled integer widening, we could more easily make use of this.

Sorry, just saw your reply on D66416!

@ABataev Are you still looking at this or should we move to D66416? Now that X86 has enabled integer widening, we could more easily make use of this.

I just don't have enough time for this. I referenced this patch in D66416. This patch is not X86 specific (if integer widening is supported only for X86) + it works with the existing cost model more correctly, I believe, since our cost model is not adapted to non-power-2 vectors and must be tweaked carefully. I suggested that someone else could take this patch and split it into several smaller patches, etc. What do you think?

RKSimon added inline comments.Thu, Sep 5, 2:32 PM
lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1174

Should this be a separate patch?

ABataev marked an inline comment as done.Thu, Sep 5, 2:38 PM
ABataev added inline comments.
lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1174

Oops, added it accidentally.

spatel added inline comments.Fri, Sep 6, 11:08 AM
test/Transforms/SLPVectorizer/X86/sext.ll
602–619

For reference, this is the test noted in PR42997:
https://bugs.llvm.org/show_bug.cgi?id=42997
...and this patch will produce the expected AVX codegen:

vpmovsxwq	(%rdi), %xmm0
vpmovsxwq	4(%rdi), %xmm1
vinsertf128	$1, %xmm1, %ymm0, %ymm0