This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Redesign vectorization of the gather nodes.
ClosedPublic

Authored by ABataev on Oct 4 2022, 9:10 AM.

Details

Summary

Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Diff Detail

Event Timeline

ABataev created this revision.Oct 4 2022, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 9:10 AM
ABataev requested review of this revision.Oct 4 2022, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 9:10 AM

As usual, there's a lot going on in this patch! For starters - it looks like there a number of cleanup changes that can be pulled out?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3911

Independent change?

3938

Independent change?

3940

Independent change?

8100

Idx shadow variable?

As usual, there's a lot going on in this patch! For starters - it looks like there a number of cleanup changes that can be pulled out?

It was split already into several parts that were committed, just forgot to extract smaller things, will do.

Most of the things are cleanup, one is the bugfix, IIRC.

ABataev updated this revision to Diff 472921.Nov 3 2022, 6:36 AM

Rebase + address comments

a few minors

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921

Split + update comments

8072

We do this matching in a couple of places now - worth adding as a TreeEntry helper method?

8139

Add some comments describing whats happening in this method.

ABataev added inline comments.Nov 3 2022, 9:06 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921

The problem is that this part does not affect current vectorization, it works only with the redesigned version. Originally we do not use gather nodes for the vectorization, just the list of scalars to produce the buildvector. That's the reason I think this must be the part of this change.

8072

Will try to do it in a separate patch.

8139

Will do

RKSimon added inline comments.Nov 3 2022, 9:41 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921

Sorry, I meant that the comment should updated/split as the reuses mask is always reordered now, not just for the early out.

ABataev added inline comments.Nov 3 2022, 9:52 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921

I see, thanks

ABataev updated this revision to Diff 472993.Nov 3 2022, 11:33 AM

Address comments

RKSimon added inline comments.Nov 4 2022, 7:53 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209

Please can you explain what this poisonvalue is for and why we don't need freeze for it.

ABataev added inline comments.Nov 4 2022, 8:00 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209

It is to build the the shuffle with the poison only. On line 8271 the corresponding shuffle position is set to the non-poisoned element from the buildvector. The freeze is not required because we already checked that the scalar in Pos position is non-poisoned.

RKSimon accepted this revision.Nov 4 2022, 8:22 AM

LGTM with one minor - cheers

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209

OK - please add a comment explaining that.

This revision is now accepted and ready to land.Nov 4 2022, 8:22 AM
This revision was automatically updated to reflect the committed changes.

this change is causing a crash on the following IR

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.16.0"

%"class.(anonymous namespace)::ESMatrix" = type { [4 x [4 x float]] }

define void @"?Multiply@ESMatrix@?A0x78950DC3@@QEAAXPEAV1?A0x78950DC3@@0@Z"(ptr %a, ptr %b) {
entry:
  %result = alloca %"class.(anonymous namespace)::ESMatrix", i32 0, align 4
  %arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1
  %0 = load float, ptr %arrayidx11, align 4
  %1 = load float, ptr null, align 4
  %arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3
  %2 = load float, ptr %arrayidx120, align 4
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %3 = load float, ptr %a, align 4
  %mul = fmul float %3, 0.000000e+00
  %arrayidx9 = getelementptr [4 x [4 x float]], ptr %a, i64 0, i64 0, i64 1
  %4 = load float, ptr %arrayidx9, align 4
  %mul13 = fmul float %4, %0
  %add = fadd float %mul, %mul13
  %add22 = fadd float %add, 0.000000e+00
  store float %add22, ptr %result, align 4
  %5 = load float, ptr null, align 4
  %mul43 = fmul float %3, %5
  %mul51 = fmul float %4, 0.000000e+00
  %add52 = fadd float %mul43, %mul51
  %add61 = fadd float %add52, 0.000000e+00
  %arrayidx74 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 1
  store float %add61, ptr %arrayidx74, align 4
  %mul82 = fmul float %3, 0.000000e+00
  %mul90 = fmul float %4, %1
  %add91 = fadd float %mul82, %mul90
  %add100 = fadd float %add91, 0.000000e+00
  %arrayidx113 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 2
  store float %add100, ptr %arrayidx113, align 4
  %mul121 = fmul float %3, %2
  %mul129 = fmul float %4, 0.000000e+00
  %add130 = fadd float %mul121, %mul129
  %add139 = fadd float %add130, 0.000000e+00
  %arrayidx152 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 3
  store float %add139, ptr %arrayidx152, align 4
  br label %for.body
}
$ opt -p slp-vectorizer /tmp/a.ll -disable-output
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8150: Value *llvm::slpvectorizer::BoUpSLP::vectorizeOperand(TreeEntry *, unsigned int): Assertion `(any_of(VE->UserTreeIndices, [E, Node
Idx](const EdgeInfo &EI) { return EI.EdgeIdx == NodeIdx && EI.UserTE == E; }) || any_of(VectorizableTree, [E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) { return TE->isOperandGatherNode({
E, NodeIdx}) && VE->isSame(TE->Scalars); })) && "Expected same vectorizable node."' failed.                                                                                                        

this change is causing a crash on the following IR

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.16.0"

%"class.(anonymous namespace)::ESMatrix" = type { [4 x [4 x float]] }

define void @"?Multiply@ESMatrix@?A0x78950DC3@@QEAAXPEAV1?A0x78950DC3@@0@Z"(ptr %a, ptr %b) {
entry:
  %result = alloca %"class.(anonymous namespace)::ESMatrix", i32 0, align 4
  %arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1
  %0 = load float, ptr %arrayidx11, align 4
  %1 = load float, ptr null, align 4
  %arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3
  %2 = load float, ptr %arrayidx120, align 4
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %3 = load float, ptr %a, align 4
  %mul = fmul float %3, 0.000000e+00
  %arrayidx9 = getelementptr [4 x [4 x float]], ptr %a, i64 0, i64 0, i64 1
  %4 = load float, ptr %arrayidx9, align 4
  %mul13 = fmul float %4, %0
  %add = fadd float %mul, %mul13
  %add22 = fadd float %add, 0.000000e+00
  store float %add22, ptr %result, align 4
  %5 = load float, ptr null, align 4
  %mul43 = fmul float %3, %5
  %mul51 = fmul float %4, 0.000000e+00
  %add52 = fadd float %mul43, %mul51
  %add61 = fadd float %add52, 0.000000e+00
  %arrayidx74 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 1
  store float %add61, ptr %arrayidx74, align 4
  %mul82 = fmul float %3, 0.000000e+00
  %mul90 = fmul float %4, %1
  %add91 = fadd float %mul82, %mul90
  %add100 = fadd float %add91, 0.000000e+00
  %arrayidx113 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 2
  store float %add100, ptr %arrayidx113, align 4
  %mul121 = fmul float %3, %2
  %mul129 = fmul float %4, 0.000000e+00
  %add130 = fadd float %mul121, %mul129
  %add139 = fadd float %add130, 0.000000e+00
  %arrayidx152 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 3
  store float %add139, ptr %arrayidx152, align 4
  br label %for.body
}
$ opt -p slp-vectorizer /tmp/a.ll -disable-output
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8150: Value *llvm::slpvectorizer::BoUpSLP::vectorizeOperand(TreeEntry *, unsigned int): Assertion `(any_of(VE->UserTreeIndices, [E, Node
Idx](const EdgeInfo &EI) { return EI.EdgeIdx == NodeIdx && EI.UserTE == E; }) || any_of(VectorizableTree, [E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) { return TE->isOperandGatherNode({
E, NodeIdx}) && VE->isSame(TE->Scalars); })) && "Expected same vectorizable node."' failed.                                                                                                        

Must be fixed in 0a33ceee0105c94060c8a6089a2e489a8a7a5cb7