Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
It was split already into several parts that were committed, just forgot to extract smaller things, will do.
Most of the things are cleanup, one is the bugfix, IIRC.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4009–4010 | The problem is that this part does not affect current vectorization, it works only with the redesigned version. Originally we do not use gather nodes for the vectorization, just the list of scalars to produce the buildvector. That's the reason I think this must be the part of this change. | |
8129 | Will try to do it in a separate patch. | |
8200 | Will do |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4009–4010 | Sorry, I meant that the comment should updated/split as the reuses mask is always reordered now, not just for the early out. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
4009–4010 | I see, thanks |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
8266 | Please can you explain what this poisonvalue is for and why we don't need freeze for it. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
8266 | It is to build the the shuffle with the poison only. On line 8271 the corresponding shuffle position is set to the non-poisoned element from the buildvector. The freeze is not required because we already checked that the scalar in Pos position is non-poisoned. |
LGTM with one minor - cheers
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
8266 | OK - please add a comment explaining that. |
this change is causing a crash on the following IR
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc19.16.0" %"class.(anonymous namespace)::ESMatrix" = type { [4 x [4 x float]] } define void @"?Multiply@ESMatrix@?A0x78950DC3@@QEAAXPEAV1?A0x78950DC3@@0@Z"(ptr %a, ptr %b) { entry: %result = alloca %"class.(anonymous namespace)::ESMatrix", i32 0, align 4 %arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1 %0 = load float, ptr %arrayidx11, align 4 %1 = load float, ptr null, align 4 %arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3 %2 = load float, ptr %arrayidx120, align 4 br label %for.body for.body: ; preds = %for.body, %entry %3 = load float, ptr %a, align 4 %mul = fmul float %3, 0.000000e+00 %arrayidx9 = getelementptr [4 x [4 x float]], ptr %a, i64 0, i64 0, i64 1 %4 = load float, ptr %arrayidx9, align 4 %mul13 = fmul float %4, %0 %add = fadd float %mul, %mul13 %add22 = fadd float %add, 0.000000e+00 store float %add22, ptr %result, align 4 %5 = load float, ptr null, align 4 %mul43 = fmul float %3, %5 %mul51 = fmul float %4, 0.000000e+00 %add52 = fadd float %mul43, %mul51 %add61 = fadd float %add52, 0.000000e+00 %arrayidx74 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 1 store float %add61, ptr %arrayidx74, align 4 %mul82 = fmul float %3, 0.000000e+00 %mul90 = fmul float %4, %1 %add91 = fadd float %mul82, %mul90 %add100 = fadd float %add91, 0.000000e+00 %arrayidx113 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 2 store float %add100, ptr %arrayidx113, align 4 %mul121 = fmul float %3, %2 %mul129 = fmul float %4, 0.000000e+00 %add130 = fadd float %mul121, %mul129 %add139 = fadd float %add130, 0.000000e+00 %arrayidx152 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 3 store float %add139, ptr %arrayidx152, align 4 br label %for.body }
$ opt -p slp-vectorizer /tmp/a.ll -disable-output opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8150: Value *llvm::slpvectorizer::BoUpSLP::vectorizeOperand(TreeEntry *, unsigned int): Assertion `(any_of(VE->UserTreeIndices, [E, Node Idx](const EdgeInfo &EI) { return EI.EdgeIdx == NodeIdx && EI.UserTE == E; }) || any_of(VectorizableTree, [E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) { return TE->isOperandGatherNode({ E, NodeIdx}) && VE->isSame(TE->Scalars); })) && "Expected same vectorizable node."' failed.
Independent change?