This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
16/17
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
7/7
pr42022.ll

Differential D70068

[SLP] Enhance SLPVectorizer to vectorize vector aggregate
ClosedPublic

Authored by anton-afanasyev on Nov 11 2019, 1:03 AM.

Download Raw Diff

Details

Reviewers

RKSimon
ABataev
dtemirbulatov
spatel
vporpo

Commits

rG80cd6b6e043f: [SLP] Enhance SLPVectorizer to vectorize vector aggregate

Summary

Vector aggregate is homogeneous aggregate of vectors like { <2 x float>, <2 x float> }.
This patch allows findBuildAggregate() to consider vector aggregates as
well as scalar ones. For instance, { <2 x float>, <2 x float> } maps to <4 x float>.

Fixes vector part of llvm.org/PR42022

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anton-afanasyev created this revision.Nov 11 2019, 1:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2019, 1:03 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B40729: Diff 228642.Nov 11 2019, 1:07 AM

Oops, missed regression test, I'm to add it.

Add test

Update summary

Harbormaster completed remote builds in B40730: Diff 228645.Nov 11 2019, 1:25 AM

Harbormaster completed remote builds in B40731: Diff 228646.

RKSimon added reviewers: ABataev, dtemirbulatov, spatel, vporpo.Nov 11 2019, 3:42 AM

RKSimon added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
4	Regenerate with update_test_checks.py ?

anton-afanasyev marked 2 inline comments as done.Nov 11 2019, 4:10 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
4	Should it really be autogenerated? I don't like test autogeneration for excessiveness and false positives while testing. Here we just checking vector `<4 x float>` is generated. Is autogeneration really standard for tests now?

dtemirbulatov added inline comments.Nov 11 2019, 5:00 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6981	Maybe here we could try other sizes here, for example, 8?

anton-afanasyev marked 3 inline comments as done.Nov 11 2019, 5:10 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6981	Are you talking about number 4 in `SmallVector<Value*, 4>`? It is just default size preallocated for vector, it could be potentially resized by `push_back()` below. But I believe it won't for most cases.

anton-afanasyev marked 2 inline comments as done.Nov 11 2019, 5:19 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6981	*default capacity

anton-afanasyev edited the summary of this revision. (Show Details)Nov 11 2019, 5:20 AM

dtemirbulatov added inline comments.Nov 11 2019, 5:32 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6981	yes, correct

ABataev added inline comments.Nov 11 2019, 7:22 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	Why do we have this check?
3103–3104	Also, would be good to have a test for this if we don't have it yet.
6974–6977	Not formatted
6984–6986	Just `BuildVectorOpds.append(TmpBuildVectorOpds.rbegin(), TmpBuildVectorOpds.rend());`
llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
4	It is common practice to use auto checks and demonstrate the difference then.

RKSimon added inline comments.Nov 11 2019, 7:51 AM

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
4	update_test_checks.py helps us identify any hidden regressions also you can probably remove the dso_local and local_unnamed_addr #0 tags

anton-afanasyev marked 11 inline comments as done.Nov 11 2019, 8:07 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	Hmm, you're right, I'm just to delete this check. I haven't faced any issue concerning scalability, but decided to check it just in case (`{ <vscale x 2 x float>, <vscale x 2 x float> }` is not isomorphic to `<4 x float>`?). But I see no similar check across `SLPVectorizer.cpp`, so I'm to delete this check.
6974–6977	Thanks, fixed with `clang-format`.
6984–6986	Sure, thanks!
llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
4	Ok, thank, I'm to fix it.

Update

Harbormaster completed remote builds in B40745: Diff 228702.Nov 11 2019, 8:13 AM

ABataev added inline comments.Nov 11 2019, 8:18 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	My question was different. I just wanted some explanation what's the problem with the scalable vectors? If the check is required we must have it. Also, still, would be good to have a test for this situation.

anton-afanasyev marked 2 inline comments as done.Nov 11 2019, 10:20 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	I wasn't familiar with `vscale` property, but learned it now. That check was actually unnecessary: `{ <vscale x N x Ty>, <vscale x N x Ty> }` maps to `<vscale x 2N x Ty>`, so we can deal with this scalability set to true.

RKSimon added inline comments.Nov 12 2019, 1:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	Probably worth adding at least some test coverage for this - in aarch64 maybe?

anton-afanasyev marked 4 inline comments as done.Nov 13 2019, 8:07 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	Actually `vscale` property cannot be set to true for such cases. The structure like `{ <vscale x 2 x float> }` is forbidden, I've got `error: invalid element type for struct` trying to `opt` this.
3103–3104	What do you mean by test coverage here? I can copy the same `pr42022.ll` to `llvm/test/Transforms/SLPVectorizer/AArch64`, but the only difference will be `-mtriple=` option. If you are talking about original issue (pointed here https://llvm.org/pr42022), for aarch64 the vectorization is done for "Bad" case and is failed for two "Good" case (https://godbolt.org/z/5JEGSm), but the root cause is different ABI of aarch64 compared to x86-64. This patch doesn't change this and doesn't concern this.

RKSimon added inline comments.Nov 13 2019, 8:29 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3103–3104	I was referring to adding vscale tests - but if they can't occur then that's fine.

I think we cannot handle several similar cases with aggregates inside aggregates (e.g., [2 x {float, float}], {{float, float}, {float, float}}). Perhaps we could address some of these cases too with this patch? Please see the tests below:

; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s                                                                                                                                                                                       

%StructTy = type { float, float}

define [2 x %StructTy] @ArrayOfStruct(float *%Ptr) {
  %GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
  %L0 = load float, float * %GEP0
  %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
  %L1 = load float, float * %GEP1
  %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
  %L2 = load float, float * %GEP2
  %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3
  %L3 = load float, float * %GEP3

  %Fadd0 = fadd fast float %L0, 1.1e+01
  %Fadd1 = fadd fast float %L1, 1.2e+01
  %Fadd2 = fadd fast float %L2, 1.3e+01
  %Fadd3 = fadd fast float %L3, 1.4e+01

  %StructIn0 = insertvalue %StructTy undef, float %Fadd0, 0
  %StructIn1 = insertvalue %StructTy %StructIn0, float %Fadd1, 1

  %StructIn2 = insertvalue %StructTy undef, float %Fadd2, 0
  %StructIn3 = insertvalue %StructTy %StructIn2, float %Fadd3, 1

  %Ret0 = insertvalue [2 x %StructTy] undef, %StructTy %StructIn1, 0
  %Ret1 = insertvalue [2 x %StructTy] %Ret0, %StructTy %StructIn3, 1
  ret [2 x %StructTy] %Ret1
}

define {%StructTy, %StructTy} @StructOfStruct(float *%Ptr) {
  %GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
  %L0 = load float, float * %GEP0
  %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
  %L1 = load float, float * %GEP1
  %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
  %L2 = load float, float * %GEP2
  %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3
  %L3 = load float, float * %GEP3

  %Fadd0 = fadd fast float %L0, 1.1e+01
  %Fadd1 = fadd fast float %L1, 1.2e+01
  %Fadd2 = fadd fast float %L2, 1.3e+01
  %Fadd3 = fadd fast float %L3, 1.4e+01

  %StructIn0 = insertvalue %StructTy undef, float %Fadd0, 0
  %StructIn1 = insertvalue %StructTy %StructIn0, float %Fadd1, 1

  %StructIn2 = insertvalue %StructTy undef, float %Fadd2, 0
  %StructIn3 = insertvalue %StructTy %StructIn2, float %Fadd3, 1

  %Ret0 = insertvalue {%StructTy, %StructTy} undef, %StructTy %StructIn1, 0
  %Ret1 = insertvalue {%StructTy, %StructTy} %Ret0, %StructTy %StructIn3, 1
  ret {%StructTy, %StructTy} %Ret1
}

In D70068#1746753, @vporpo wrote:

I think we cannot handle several similar cases with aggregates inside aggregates (e.g., [2 x {float, float}], {{float, float}, {float, float}}). Perhaps we could address some of these cases too with this patch? Please see the tests below:
...

Good point! I've checked your case {{float, float}, {float, float}} with modified patch and it has been successfully vectorized. But this has required several hacks.
So may be a separate patch is better? One needs modify canMapToVector() and findBuildAggregate() to be recursive, merge findBuildVector() and findBuildAggregate() -- I can send it to review as a following patch, while fixing next, Matrix22 part of https://llvm.org/pr42022 issue.

Well, ideally we should be able to handle any combination and nesting of scalars, vectors and aggregates in a unified way.
For example:

{float, <2 x float>, float},
{float, float, <2 x float>},
{{float, float}, <4 x float>, <2 x float>},
{{float}, {float}, [2 x float]}, 
{{{float, float}, float}, float}

etc.
This would require some redesign of the code, which is why I think it makes sense to have a single patch for all of them.

But yes, feel free to split them into separate patches.

But could you also add some more tests to show which of these cases this patch is taking care of. For example, I think {<2 x float>, <2 x float>, <4 x float>} or {<2 x float>, float, float} will not work. Here are some examples:

; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s  

define { <2 x float>, float, float } @StructOfVectorAndScalars(float *%Ptr) {
  %GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
  %L0 = load float, float * %GEP0
  %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
  %L1 = load float, float * %GEP1
  %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
  %L2 = load float, float * %GEP2
  %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3
  %L3 = load float, float * %GEP3

  %VecIn0 = insertelement <2 x float> undef, float %L0, i64 0
  %VecIn1 = insertelement <2 x float> %VecIn0, float %L1, i64 1

  %Ret0 = insertvalue {<2 x float>, float, float} undef, <2 x float> %VecIn1, 0
  %Ret1 = insertvalue {<2 x float>, float, float} %Ret0, float %L2, 1
  %Ret2 = insertvalue {<2 x float>, float, float} %Ret1, float %L3, 1
  ret {<2 x float>, float, float} %Ret1
}

define { <2 x float>, <2 x float>, <4 x float> } @StructOfVectors(float *%Ptr) {
  %GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
  %L0 = load float, float * %GEP0
  %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
  %L1 = load float, float * %GEP1
  %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
  %L2 = load float, float * %GEP2
  %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3
  %L3 = load float, float * %GEP3

  %GEP4 = getelementptr inbounds float, float* %Ptr, i64 4
  %L4 = load float, float * %GEP4
  %GEP5 = getelementptr inbounds float, float* %Ptr, i64 5
  %L5 = load float, float * %GEP5
  %GEP6 = getelementptr inbounds float, float* %Ptr, i64 6
  %L6 = load float, float * %GEP6
  %GEP7 = getelementptr inbounds float, float* %Ptr, i64 7
  %L7 = load float, float * %GEP7

  %VecIn0 = insertelement <2 x float> undef, float %L0, i64 0
  %VecIn1 = insertelement <2 x float> %VecIn0, float %L1, i64 1

  %VecIn2 = insertelement <2 x float> undef, float %L2, i64 0
  %VecIn3 = insertelement <2 x float> %VecIn2, float %L3, i64 1

  %VecIn4 = insertelement <4 x float> undef, float %L4, i64 0
  %VecIn5 = insertelement <4 x float> %VecIn4, float %L5, i64 1
  %VecIn6 = insertelement <4 x float> %VecIn5, float %L6, i64 2
  %VecIn7 = insertelement <4 x float> %VecIn6, float %L7, i64 3

  %Ret0 = insertvalue {<2 x float>, <2 x float>, <4 x float>} undef, <2 x float> %VecIn1, 0
  %Ret1 = insertvalue {<2 x float>, <2 x float>, <4 x float>} %Ret0, <2 x float> %VecIn3, 1
  %Ret2 = insertvalue {<2 x float>, <2 x float>, <4 x float>} %Ret1, <4 x float> %VecIn7, 2
  ret {<2 x float>, <2 x float>, <4 x float>} %Ret2
}

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

Could you replace this test with the smallest test that exposes the issue?
I think something like this should do the job:

; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s

; Checks that vector insertvalues into the struct become SLP seeds.
define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
  %GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
  %L0 = load float, float * %GEP0
  %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
  %L1 = load float, float * %GEP1
  %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
  %L2 = load float, float * %GEP2
  %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3
  %L3 = load float, float * %GEP3

  %VecIn0 = insertelement <2 x float> undef, float %L0, i64 0
  %VecIn1 = insertelement <2 x float> %VecIn0, float %L1, i64 1

  %VecIn2 = insertelement <2 x float> undef, float %L2, i64 0
  %VecIn3 = insertelement <2 x float> %VecIn2, float %L3, i64 1

  %Ret0 = insertvalue {<2 x float>, <2 x float>} undef, <2 x float> %VecIn1, 0
  %Ret1 = insertvalue {<2 x float>, <2 x float>} %Ret0, <2 x float> %VecIn3, 1
  ret {<2 x float>, <2 x float>} %Ret1
}

anton-afanasyev marked 2 inline comments as done.Nov 16 2019, 2:24 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
1	Thanks, I've replaced this with your test!

Update test with shorter one

Harbormaster completed remote builds in B41080: Diff 229683.Nov 16 2019, 2:28 AM

In D70068#1748108, @vporpo wrote:
Well, ideally we should be able to handle any combination and nesting of scalars, vectors and aggregates in a unified way.
For example:
{float, <2 x float>, float},
{float, float, <2 x float>},
{{float, float}, <4 x float>, <2 x float>},
{{float}, {float}, [2 x float]}, 
{{{float, float}, float}, float}
etc.
This would require some redesign of the code, which is why I think it makes sense to have a single patch for all of them.

But yes, feel free to split them into separate patches.

Actually I'm not sure vectorizer should support vectorization of _any_ combinations. For instance, does add operation make sense for structure like {float, float, <2 x float>} in real life (dictated by frontend output, mostly from )? I would focus on _homogeneous_ aggregate -- aggregate of equal type elements, including any nesting depth like {{{float, float}, {float, float}},{{float, float}, {float, float}}}.

In D70068#1748108, @vporpo wrote:

...
But could you also add some more tests to show which of these cases this patch is taking care of. For example, I think {<2 x float>, <2 x float>, <4 x float>} or {<2 x float>, float, float} will not work. Here are some examples:
...

Yes, thanks, I'm to add two tests for homogeneous and non-homogeneous cases.

Added several tests

Harbormaster completed remote builds in B41081: Diff 229685.Nov 16 2019, 3:45 AM

Added test for {{{i16,i16},{i16,i16}},{{i16,i16},{i16,i16}}}.
I'm to process this and other previously added test cases by the following commit.

Harbormaster completed remote builds in B41293: Diff 230400.Nov 21 2019, 2:00 AM

Is it ok to lgtm for now?

Please can you pre-commit the tests with current codegen and then rebase to show the diff?

Updated test diff against precommitted tests

In D70068#1755110, @RKSimon wrote:

Please can you pre-commit the tests with current codegen and then rebase to show the diff?

Ok, done

Harbormaster completed remote builds in B41358: Diff 230602.Nov 22 2019, 12:50 AM

Updated

Harbormaster completed remote builds in B41359: Diff 230603.Nov 22 2019, 12:52 AM

Here is further enhencement considering different combinations of aggregates: https://reviews.llvm.org/D70587

LGTM - thanks

This revision is now accepted and ready to land.Nov 22 2019, 3:12 AM

anton-afanasyev mentioned this in rG6d73265ad841: [SLP][Test] Precommit tests for D70068 and D70587. NFC..Nov 22 2019, 8:54 AM

Closed by commit rG80cd6b6e043f: [SLP] Enhance SLPVectorizer to vectorize vector aggregate (authored by anton-afanasyev). · Explain WhyNov 22 2019, 9:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

33 lines

test/

Transforms/

SLPVectorizer/

X86/

pr42022.ll

25 lines

Diff 230671

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines	public:
}		}

// \returns minimum vector register size as set by cl::opt.		// \returns minimum vector register size as set by cl::opt.
unsigned getMinVecRegSize() const {		unsigned getMinVecRegSize() const {
return MinVecRegSize;		return MinVecRegSize;
}		}

/// Check if ArrayType or StructType is isomorphic to some VectorType.		/// Check if ArrayType or StructType is isomorphic to some VectorType.
		/// Accepts homogeneous aggregate of vectors like
		/// { <2 x float>, <2 x float> }
///		///
/// \returns number of elements in vector if isomorphism exists, 0 otherwise.		/// \returns number of elements in vector if isomorphism exists, 0 otherwise.
unsigned canMapToVector(Type *T, const DataLayout &DL) const;		unsigned canMapToVector(Type *T, const DataLayout &DL) const;

/// \returns True if the VectorizableTree is both tiny and not fully		/// \returns True if the VectorizableTree is both tiny and not fully
/// vectorizable. We do not vectorize such trees.		/// vectorizable. We do not vectorize such trees.
bool isTreeTinyAndNotFullyVectorizable() const;		bool isTreeTinyAndNotFullyVectorizable() const;

▲ Show 20 Lines • Show All 2,450 Lines • ▼ Show 20 Lines	unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {
auto *ST = dyn_cast<StructType>(T);		auto *ST = dyn_cast<StructType>(T);
if (ST) {		if (ST) {
N = ST->getNumElements();		N = ST->getNumElements();
EltTy = *ST->element_begin();		EltTy = *ST->element_begin();
} else {		} else {
N = cast<ArrayType>(T)->getNumElements();		N = cast<ArrayType>(T)->getNumElements();
EltTy = cast<ArrayType>(T)->getElementType();		EltTy = cast<ArrayType>(T)->getElementType();
}		}

		if (auto *VT = dyn_cast<VectorType>(EltTy)) {
		EltTy = VT->getElementType();
		N *= VT->getNumElements();
		ABataevUnsubmitted Done Reply Inline Actions Why do we have this check? ABataev: Why do we have this check?
		ABataevUnsubmitted Done Reply Inline Actions Also, would be good to have a test for this if we don't have it yet. ABataev: Also, would be good to have a test for this if we don't have it yet.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, you're right, I'm just to delete this check. I haven't faced any issue concerning scalability, but decided to check it just in case (`{ <vscale x 2 x float>, <vscale x 2 x float> }` is not isomorphic to `<4 x float>`?). But I see no similar check across `SLPVectorizer.cpp`, so I'm to delete this check. anton-afanasyev: Hmm, you're right, I'm just to delete this check. I haven't faced any issue concerning…
		ABataevUnsubmitted Done Reply Inline Actions My question was different. I just wanted some explanation what's the problem with the scalable vectors? If the check is required we must have it. Also, still, would be good to have a test for this situation. ABataev: My question was different. I just wanted some explanation what's the problem with the scalable…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I wasn't familiar with `vscale` property, but learned it now. That check was actually unnecessary: `{ <vscale x N x Ty>, <vscale x N x Ty> }` maps to `<vscale x 2N x Ty>`, so we can deal with this scalability set to true. anton-afanasyev: I wasn't familiar with `vscale` property, but learned it now. That check was actually…
		RKSimonUnsubmitted Done Reply Inline Actions Probably worth adding at least some test coverage for this - in aarch64 maybe? RKSimon: Probably worth adding at least some test coverage for this - in aarch64 maybe?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions What do you mean by test coverage here? I can copy the same `pr42022.ll` to `llvm/test/Transforms/SLPVectorizer/AArch64`, but the only difference will be `-mtriple=` option. If you are talking about original issue (pointed here https://llvm.org/pr42022), for aarch64 the vectorization is done for "Bad" case and is failed for two "Good" case (https://godbolt.org/z/5JEGSm), but the root cause is different ABI of aarch64 compared to x86-64. This patch doesn't change this and doesn't concern this. anton-afanasyev: What do you mean by test coverage here? I can copy the same `pr42022.ll` to…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Actually `vscale` property cannot be set to true for such cases. The structure like `{ <vscale x 2 x float> }` is forbidden, I've got `error: invalid element type for struct` trying to `opt` this. anton-afanasyev: Actually `vscale` property cannot be set to true for such cases. The structure like `{ <vscale…
		RKSimonUnsubmitted Not Done Reply Inline Actions I was referring to adding vscale tests - but if they can't occur then that's fine. RKSimon: I was referring to adding vscale tests - but if they can't occur then that's fine.
		}

if (!isValidElementType(EltTy))		if (!isValidElementType(EltTy))
return 0;		return 0;
uint64_t VTSize = DL.getTypeStoreSizeInBits(VectorType::get(EltTy, N));		uint64_t VTSize = DL.getTypeStoreSizeInBits(VectorType::get(EltTy, N));
if (VTSize < MinVecRegSize \|\| VTSize > MaxVecRegSize \|\| VTSize != DL.getTypeStoreSizeInBits(T))		if (VTSize < MinVecRegSize \|\| VTSize > MaxVecRegSize \|\| VTSize != DL.getTypeStoreSizeInBits(T))
return 0;		return 0;
if (ST) {		if (ST) {
// Check that struct is homogeneous.		// Check that struct is homogeneous.
for (const auto *Ty : ST->elements())		for (const auto *Ty : ST->elements())
if (Ty != EltTy)		if (Ty != *ST->element_begin())
return 0;		return 0;
}		}
return N;		return N;
}		}

bool BoUpSLP::canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool BoUpSLP::canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const {		SmallVectorImpl<unsigned> &CurrentOrder) const {
Instruction *E0 = cast<Instruction>(OpValue);		Instruction *E0 = cast<Instruction>(OpValue);
▲ Show 20 Lines • Show All 3,839 Lines • ▼ Show 20 Lines	do {
if (!LastInsertElem \|\| !LastInsertElem->hasOneUse())		if (!LastInsertElem \|\| !LastInsertElem->hasOneUse())
return false;		return false;
} while (true);		} while (true);
std::reverse(BuildVectorOpds.begin(), BuildVectorOpds.end());		std::reverse(BuildVectorOpds.begin(), BuildVectorOpds.end());
return true;		return true;
}		}

/// Like findBuildVector, but looks for construction of aggregate.		/// Like findBuildVector, but looks for construction of aggregate.
		/// Accepts homegeneous aggregate of vectors like { <2 x float>, <2 x float> }.
///		///
/// \return true if it matches.		/// \return true if it matches.
static bool findBuildAggregate(InsertValueInst *IV,		static bool findBuildAggregate(InsertValueInst IV, TargetTransformInfo TTI,
SmallVectorImpl<Value *> &BuildVectorOpds) {		SmallVectorImpl<Value *> &BuildVectorOpds,
		int &UserCost) {
		UserCost = 0;
		ABataevUnsubmitted Done Reply Inline Actions Not formatted ABataev: Not formatted
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, fixed with `clang-format`. anton-afanasyev: Thanks, fixed with `clang-format`.
do {		do {
		if (auto *IE = dyn_cast<InsertElementInst>(IV->getInsertedValueOperand())) {
		int TmpUserCost;
		SmallVector<Value *, 4> TmpBuildVectorOpds;
		dtemirbulatovUnsubmitted Done Reply Inline Actions Maybe here we could try other sizes here, for example, 8? dtemirbulatov: Maybe here we could try other sizes here, for example, 8?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Are you talking about number 4 in `SmallVector<Value, 4>`? It is just default size preallocated for vector, it could be potentially resized by `push_back()` below. But I believe it won't for most cases. anton-afanasyev:* Are you talking about number 4 in `SmallVector<Value*, 4>`? It is just default size…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions default capacity anton-afanasyev:* *default capacity
		dtemirbulatovUnsubmitted Done Reply Inline Actions yes, correct dtemirbulatov: yes, correct
		if (!findBuildVector(IE, TTI, TmpBuildVectorOpds, TmpUserCost))
		return false;
		BuildVectorOpds.append(TmpBuildVectorOpds.rbegin(), TmpBuildVectorOpds.rend());
		UserCost += TmpUserCost;
		} else {
		ABataevUnsubmitted Done Reply Inline Actions Just `BuildVectorOpds.append(TmpBuildVectorOpds.rbegin(), TmpBuildVectorOpds.rend());` ABataev: Just `BuildVectorOpds.append(TmpBuildVectorOpds.rbegin(), TmpBuildVectorOpds.rend());`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! anton-afanasyev: Sure, thanks!
BuildVectorOpds.push_back(IV->getInsertedValueOperand());		BuildVectorOpds.push_back(IV->getInsertedValueOperand());
		}
Value *V = IV->getAggregateOperand();		Value *V = IV->getAggregateOperand();
if (isa<UndefValue>(V))		if (isa<UndefValue>(V))
break;		break;
IV = dyn_cast<InsertValueInst>(V);		IV = dyn_cast<InsertValueInst>(V);
if (!IV \|\| !IV->hasOneUse())		if (!IV \|\| !IV->hasOneUse())
return false;		return false;
} while (true);		} while (true);
std::reverse(BuildVectorOpds.begin(), BuildVectorOpds.end());		std::reverse(BuildVectorOpds.begin(), BuildVectorOpds.end());
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);		return tryToVectorize(I, R);
};		};
return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,
ExtraVectorization);		ExtraVectorization);
}		}

bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,		bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
		int UserCost = 0;
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
if (!R.canMapToVector(IVI->getType(), DL))		if (!R.canMapToVector(IVI->getType(), DL))
return false;		return false;

SmallVector<Value *, 16> BuildVectorOpds;		SmallVector<Value *, 16> BuildVectorOpds;
if (!findBuildAggregate(IVI, BuildVectorOpds))		if (!findBuildAggregate(IVI, TTI, BuildVectorOpds, UserCost))
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IVI << "\n");		LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IVI << "\n");
// Aggregate value is unlikely to be processed in vector register, we need to		// Aggregate value is unlikely to be processed in vector register, we need to
// extract scalars into scalar registers, so NeedExtraction is set true.		// extract scalars into scalar registers, so NeedExtraction is set true.
return tryToVectorizeList(BuildVectorOpds, R);		return tryToVectorizeList(BuildVectorOpds, R, UserCost);
}		}

bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,		bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
int UserCost;		int UserCost;
SmallVector<Value *, 16> BuildVectorOpds;		SmallVector<Value *, 16> BuildVectorOpds;
if (!findBuildVector(IEI, TTI, BuildVectorOpds, UserCost) \|\|		if (!findBuildVector(IEI, TTI, BuildVectorOpds, UserCost) \|\|
(llvm::all_of(BuildVectorOpds,		(llvm::all_of(BuildVectorOpds,
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				vporpoUnsubmitted Done Reply Inline Actions Could you replace this test with the smallest test that exposes the issue? I think something like this should do the job: ; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s ; Checks that vector insertvalues into the struct become SLP seeds. define { <2 x float>, <2 x float> } @StructOfVectors(float %Ptr) { %GEP0 = getelementptr inbounds float, float %Ptr, i64 0 %L0 = load float, float * %GEP0 %GEP1 = getelementptr inbounds float, float* %Ptr, i64 1 %L1 = load float, float * %GEP1 %GEP2 = getelementptr inbounds float, float* %Ptr, i64 2 %L2 = load float, float * %GEP2 %GEP3 = getelementptr inbounds float, float* %Ptr, i64 3 %L3 = load float, float * %GEP3 %VecIn0 = insertelement <2 x float> undef, float %L0, i64 0 %VecIn1 = insertelement <2 x float> %VecIn0, float %L1, i64 1 %VecIn2 = insertelement <2 x float> undef, float %L2, i64 0 %VecIn3 = insertelement <2 x float> %VecIn2, float %L3, i64 1 %Ret0 = insertvalue {<2 x float>, <2 x float>} undef, <2 x float> %VecIn1, 0 %Ret1 = insertvalue {<2 x float>, <2 x float>} %Ret0, <2 x float> %VecIn3, 1 ret {<2 x float>, <2 x float>} %Ret1 } vporpo: Could you replace this test with the smallest test that exposes the issue? I think something…
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, I've replaced this with your test! anton-afanasyev: Thanks, I've replaced this with your test!
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context
				RKSimonUnsubmitted Done Reply Inline Actions Regenerate with update_test_checks.py ? RKSimon: Regenerate with update_test_checks.py ?
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Should it really be autogenerated? I don't like test autogeneration for excessiveness and false positives while testing. Here we just checking vector `<4 x float>` is generated. Is autogeneration really standard for tests now? anton-afanasyev: Should it really be autogenerated? I don't like test autogeneration for excessiveness and false…
				ABataevUnsubmitted Done Reply Inline Actions It is common practice to use auto checks and demonstrate the difference then. ABataev: It is common practice to use auto checks and demonstrate the difference then.
				RKSimonUnsubmitted Done Reply Inline Actions update_test_checks.py helps us identify any hidden regressions also you can probably remove the dso_local and local_unnamed_addr #0 tags RKSimon: update_test_checks.py helps us identify any hidden regressions also you can probably remove…
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, thank, I'm to fix it. anton-afanasyev: Ok, thank, I'm to fix it.

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[GEP2]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], <float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 0			; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i64 0
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP7]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP5]], i32 1			; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP8]], i64 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP6]], i32 0			; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP6]], i64 0
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP9]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP6]], i32 1			; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP10]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Enhance SLPVectorizer to vectorize vector aggregateClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230671

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

[SLP] Enhance SLPVectorizer to vectorize vector aggregate
ClosedPublic