This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
6/14
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
crash_scheduling-inseltpoison.ll
-
crash_scheduling.ll
-
extracts-with-undefs.ll

Differential D115750

[SLP]Further improvement of the cost model for scalars used in buildvectors.
ClosedPublic

Authored by ABataev on Dec 14 2021, 12:08 PM.

Download Raw Diff

Details

Reviewers

RKSimon
anton-afanasyev
dtemirbulatov

Commits

rGf5d45d70a511: [SLP]Further improvement of the cost model for scalars used in buildvectors.
rG99f31acfce33: [SLP]Further improvement of the cost model for scalars used in buildvectors.

Summary

Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function.

The cost is calculated in the following way:

If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements.
If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask.
If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps.

The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph.

Part of D107966.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Dec 14 2021, 12:08 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 14 2021, 12:08 PM

ABataev requested review of this revision.Dec 14 2021, 12:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 12:08 PM

Harbormaster completed remote builds in B139277: Diff 394339.Dec 14 2021, 1:03 PM

RKSimon edited the summary of this revision. (Show Details)Jan 3 2022, 8:54 AM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 10:55 AM

Herald added a subscriber: vporpo. · View Herald Transcript

Harbormaster completed remote builds in B162036: Diff 426117.Apr 29 2022, 1:27 PM

Anything we can do to simplify this patch would be great - there's a lot going on.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6329	Do you intend to use performExtractsShuffleAction more than once in the future? Otherwise some of these function_ref seem superfluous.
6473–6474	Some of this is NFC - pre-commit to reduce the patch?
6477–6480	IEBase is invariant to the inner do-while loop - should this still use Base?
6495–6502	Pulling out Idx is just a NFC - precommit?

ABataev added inline comments.May 1 2022, 6:36 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6329	Yes, this patch is part of D107966, which unifies cost estimation and code emission.
6477–6480	I'll check

RKSimon added inline comments.May 1 2022, 6:47 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6563	It might make it cleaner if we pull out the lambda instead of embedding inside the performExtractsShuffleAction call?

Address comments + rebase.

Harbormaster completed remote builds in B162318: Diff 426506.May 2 2022, 2:29 PM

RKSimon added inline comments.May 3 2022, 6:24 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	Is this correct - not sure if you want braces here or just a newline

ABataev added inline comments.May 3 2022, 8:37 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	Want to dump TreeEntries here, if possible, helps to get more info about the graph, rather than just an instruction.

ABataev added inline comments.May 3 2022, 8:45 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	Ah, did not understand the question. Everything is ok here, just the first element might be a `nullptr`

RKSimon added inline comments.May 3 2022, 8:47 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	OK - but maybe put TEs.back()->dump(); on a separate line so it doesn't look like it should be: if (TEs.front()) { TEs.front()->dump(); TEs.back()->dump(); } I'm very surprised clang-format didn't catch this

ABataev added inline comments.May 3 2022, 8:53 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	Will enclose first dump into braces explicitly, clang-format wants to make them in one line.

RKSimon added inline comments.May 3 2022, 9:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	That's scary - I guess because this is all inside the LLVM_DEBUG macro...

Rebase + formatting

Harbormaster completed remote builds in B162504: Diff 426768.May 3 2022, 12:12 PM

LGTM

This revision is now accepted and ready to land.May 4 2022, 8:59 AM

This revision was landed with ongoing or failed builds.May 5 2022, 6:06 AM

Closed by commit rG99f31acfce33: [SLP]Further improvement of the cost model for scalars used in buildvectors. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG99f31acfce33: [SLP]Further improvement of the cost model for scalars used in buildvectors..

asbirlea mentioned this in D100486: [COST]Improve cost model for shuffles in SLP..May 5 2022, 11:20 AM

This caused (or exposed?) failing asserts:

$ cat sqrtf.c
float *a;
float c;
float sqrtf(float);
void b() {
  float d, e, f, g;
  d = c * (0 + 2 * sqrtf(c) * 0);
  e = c * (0 - (c + 1) * 0);
  f = c * (0 - 2 * sqrtf(c) * 0);
  g = 2 * (0 + (c + 1) * 0);
  a[0] = d / 0;
  a[1] = e / 0;
  a[2] = f / 0;
  a[3] = g / 0;
}
$ clang -target x86_64-linux-gnu -c -O2 -fno-math-errno sqrtf.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:6592: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

In D115750#3499441, @mstorsjo wrote:

This caused (or exposed?) failing asserts:

$ cat sqrtf.c
float *a;
float c;
float sqrtf(float);
void b() {
  float d, e, f, g;
  d = c * (0 + 2 * sqrtf(c) * 0);
  e = c * (0 - (c + 1) * 0);
  f = c * (0 - 2 * sqrtf(c) * 0);
  g = 2 * (0 + (c + 1) * 0);
  a[0] = d / 0;
  a[1] = e / 0;
  a[2] = f / 0;
  a[3] = g / 0;
}
$ clang -target x86_64-linux-gnu -c -O2 -fno-math-errno sqrtf.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:6592: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

Hi, thanks for the report, feel free to revert, I'll fix it tomorrow.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583	Yes, I believe so.

repro:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "fuzz.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local void @b() local_unnamed_addr #0 {
entry:
  %mul = fmul float undef, 2.000000e+00
  %i = tail call float @llvm.fmuladd.f32(float %mul, float 0.000000e+00, float 0.000000e+00)
  %mul2 = fmul float undef, %i
  %add = fadd float undef, 1.000000e+00
  %neg = fneg float %add
  %i1 = tail call float @llvm.fmuladd.f32(float %neg, float 0.000000e+00, float 0.000000e+00)
  %mul4 = fmul float undef, %i1
  %neg7 = fneg float %mul
  %i2 = tail call float @llvm.fmuladd.f32(float %neg7, float 0.000000e+00, float 0.000000e+00)
  %mul8 = fmul float undef, %i2
  %i3 = tail call float @llvm.fmuladd.f32(float %add, float 0.000000e+00, float 0.000000e+00)
  %mul11 = fmul float %i3, 2.000000e+00
  %div = fdiv float %mul2, 0.000000e+00
  store float %div, ptr undef, align 4
  %div12 = fdiv float %mul4, 0.000000e+00
  %arrayidx13 = getelementptr inbounds float, ptr undef, i64 1
  store float %div12, ptr %arrayidx13, align 4
  %div14 = fdiv float %mul8, 0.000000e+00
  %arrayidx15 = getelementptr inbounds float, ptr undef, i64 2
  store float %div14, ptr %arrayidx15, align 4
  %div16 = fdiv float %mul11, 0.000000e+00
  %arrayidx17 = getelementptr inbounds float, ptr undef, i64 3
  store float %div16, ptr %arrayidx17, align 4
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.fmuladd.f32(float, float, float) #1

attributes #0 = { "tune-cpu"="generic" }
attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

i am seeing the same assert on the amdgpu target

the reproducer posted by previous two responders is way smaller than what i have

Must be fixed in 9c3a75eabf577f0e0e372be95ec4861600a5acdb

In D115750#3500827, @ABataev wrote:

Must be fixed in 9c3a75eabf577f0e0e372be95ec4861600a5acdb

still fails for me, see https://github.com/llvm/llvm-project/issues/55359 "'llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6596: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.' since https://github.com/llvm/llvm-project/commit/99f31acfce338417fea3c14983d6f8fedc8ed043 '[SLP]Further improvement of the cost model for scalars used in buildvectors.'" (which I filed so that I could attach the non-reduced reproducer there)

another issue that repros at head

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
  br label %1

1:                                                ; preds = %1, %0
  %2 = fadd float 0.000000e+00, 0.000000e+00
  %3 = fadd float 0.000000e+00, 0.000000e+00
  %4 = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %5 = insertelement <2 x float> %4, float %3, i64 1
  %6 = insertelement <2 x float> %5, float %2, i64 0
  br label %1
}
$ ./build/rel/bin/opt -passes=slp-vectorizer -disable-output /tmp/a.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6500: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

(I did see a different instance of this crash get fixed with some other commit but this one still remains)

In D115750#3501807, @aeubanks wrote:

another issue that repros at head

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
  br label %1

1:                                                ; preds = %1, %0
  %2 = fadd float 0.000000e+00, 0.000000e+00
  %3 = fadd float 0.000000e+00, 0.000000e+00
  %4 = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %5 = insertelement <2 x float> %4, float %3, i64 1
  %6 = insertelement <2 x float> %5, float %2, i64 0
  br label %1
}
$ ./build/rel/bin/opt -passes=slp-vectorizer -disable-output /tmp/a.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6500: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

(I did see a different instance of this crash get fixed with some other commit but this one still remains)

Yes, have a fix for this issue already, need to relax an assert, will commit in couple minutes

Decided to revert this patch and several others to fix the bugs.

another crash even with https://reviews.llvm.org/rGcce80bd8b74d54deb82b1b6ae0cbec1ab53c1dbb

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
.lr.ph.i41:
  br label %.lr.ph.split.us.i

.lr.ph.split.us.i:                                ; preds = %.lr.ph.split.us.i, %.lr.ph.i41
  %0 = fadd float 0.000000e+00, 0.000000e+00
  %1 = fadd float 0.000000e+00, 0.000000e+00
  %2 = fadd float %0, 0.000000e+00
  %3 = fadd float %1, 0.000000e+00
  %.sroa.3.8.vec.insert.i.us.i = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %.sroa.3.12.vec.insert.i.us.i = insertelement <2 x float> %.sroa.3.8.vec.insert.i.us.i, float %3, i64 1
  %.sroa.025.4.vec.insert.us.i = insertelement <2 x float> %.sroa.3.12.vec.insert.i.us.i, float %0, i64 0
  br label %.lr.ph.split.us.i
}

In D115750#3501860, @aeubanks wrote:

another crash even with https://reviews.llvm.org/rGcce80bd8b74d54deb82b1b6ae0cbec1ab53c1dbb

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
.lr.ph.i41:
  br label %.lr.ph.split.us.i

.lr.ph.split.us.i:                                ; preds = %.lr.ph.split.us.i, %.lr.ph.i41
  %0 = fadd float 0.000000e+00, 0.000000e+00
  %1 = fadd float 0.000000e+00, 0.000000e+00
  %2 = fadd float %0, 0.000000e+00
  %3 = fadd float %1, 0.000000e+00
  %.sroa.3.8.vec.insert.i.us.i = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %.sroa.3.12.vec.insert.i.us.i = insertelement <2 x float> %.sroa.3.8.vec.insert.i.us.i, float %3, i64 1
  %.sroa.025.4.vec.insert.us.i = insertelement <2 x float> %.sroa.3.12.vec.insert.i.us.i, float %0, i64 0
  br label %.lr.ph.split.us.i
}

Thanks, will add it to fixed version of the patch

(if you're interested in testing the recommit, consider running opt -O2 on the attached file, that's where I've been pulling the crashes from)

b.ll.txt2 MBDownload

ABataev added a reverting change: rG4212ef8a0e5c: Revert "[SLP]Further improvement of the cost model for scalars used in….May 9 2022, 1:59 PM

ABataev added a commit: rGf5d45d70a511: [SLP]Further improvement of the cost model for scalars used in buildvectors..May 11 2022, 6:09 AM

fhahn mentioned this in D123409: [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost.May 13 2022, 8:14 AM

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

In D115750#3511967, @aeubanks wrote:

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

Another corner case of too strict assertion I was afraid of. Will fix it ASAP.

In D115750#3511967, @aeubanks wrote:

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

Fixed 85f6b15ee50f

another one :)

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %.sroa.0.0.vec.insert.i5.i10 = insertelement <2 x float> undef, float %3, i64 0
  %.sroa.0.4.vec.insert.i10.i13 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %8, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13, ptr null, align 4
  %.sroa.0.4.vec.insert.i10.i13.2 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %6, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13.2, ptr null, align 4
  ret void
}

In D115750#3513649, @aeubanks wrote:

another one :)

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %.sroa.0.0.vec.insert.i5.i10 = insertelement <2 x float> undef, float %3, i64 0
  %.sroa.0.4.vec.insert.i10.i13 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %8, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13, ptr null, align 4
  %.sroa.0.4.vec.insert.i10.i13.2 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %6, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13.2, ptr null, align 4
  ret void
}

Fixed in 152072801e24fb1e5cd962b0cb089230bc27b6b9

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

295 lines

test/

Transforms/

SLPVectorizer/

X86/

crash_scheduling-inseltpoison.ll

18 lines

crash_scheduling.ll

18 lines

extracts-with-undefs.ll

28 lines

Diff 427293

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,284 Lines • ▼ Show 20 Lines	if (IE2) {
IE2 = nullptr;		IE2 = nullptr;
else		else
IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));		IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));
}		}
} while (IE1 \|\| IE2);		} while (IE1 \|\| IE2);
return false;		return false;
}		}

		/// Checks if the \p IE1 instructions is followed by \p IE2 instruction in the
		/// buildvector sequence.
		static bool isFirstInsertElement(const InsertElementInst *IE1,
		const InsertElementInst *IE2) {
		const auto *I1 = IE1;
		const auto *I2 = IE2;
		do {
		if (I2 == IE1)
		return true;
		if (I1 == IE2)
		return false;
		if (I1)
		I1 = dyn_cast<InsertElementInst>(I1->getOperand(0));
		if (I2)
		I2 = dyn_cast<InsertElementInst>(I2->getOperand(0));
		} while (I1 \|\| I2);
		llvm_unreachable("Two different buildvectors not expected.");
		}

		/// Does the analysis of the provided shuffle masks and performs the requested
		/// actions on the vectors with the given shuffle masks. It tries to do it in
		/// several steps.
		/// 1. If the Base vector is not undef vector, resizing the very first mask to
		/// have common VF and perform action for 2 input vectors (including non-undef
		/// Base). Other shuffle masks are combined with the resulting after the 1 stage
		/// and processed as a shuffle of 2 elements.
		/// 2. If the Base is undef vector and have only 1 shuffle mask, perform the
		/// action only for 1 vector with the given mask, if it is not the identity
		/// mask.
		/// 3. If > 2 masks are used, perform the remaining shuffle actions for 2
		/// vectors, combing the masks properly between the steps.
		template <typename T>
		static T *performExtractsShuffleAction(
		MutableArrayRef<std::pair<T , SmallVector<int>>> ShuffleMask, Value Base,
		function_ref<unsigned(T *)> GetVF,
		function_ref<std::pair<T , bool>(T , ArrayRef<int>)> ResizeAction,
		function_ref<T (ArrayRef<int>, ArrayRef<T >)> Action) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Do you intend to use performExtractsShuffleAction more than once in the future? Otherwise some of these function_ref seem superfluous. RKSimon: Do you intend to use performExtractsShuffleAction more than once in the future? Otherwise some…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, this patch is part of D107966, which unifies cost estimation and code emission. ABataev: Yes, this patch is part of D107966, which unifies cost estimation and code emission.
		assert(!ShuffleMask.empty() && "Empty list of shuffles for inserts.");
		SmallVector<int> Mask(ShuffleMask.begin()->second);
		auto VMIt = std::next(ShuffleMask.begin());
		T *Prev = nullptr;
		bool IsBaseNotUndef = !isUndefVector(Base);
		if (IsBaseNotUndef) {
		// Base is not undef, need to combine it with the next subvectors.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		for (unsigned Idx = 0, VF = Mask.size(); Idx < VF; ++Idx) {
		if (Mask[Idx] == UndefMaskElem)
		Mask[Idx] = Idx;
		else
		Mask[Idx] = (Res.second ? Idx : Mask[Idx]) + VF;
		}
		Prev = Action(Mask, {nullptr, Res.first});
		} else if (ShuffleMask.size() == 1) {
		// Base is undef and only 1 vector is shuffled - perform the action only for
		// single vector, if the mask is not the identity mask.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		if (Res.second)
		// Identity mask is found.
		Prev = Res.first;
		else
		Prev = Action(Mask, {ShuffleMask.begin()->first});
		} else {
		// Base is undef and at least 2 input vectors shuffled - perform 2 vectors
		// shuffles step by step, combining shuffle between the steps.
		unsigned Vec1VF = GetVF(ShuffleMask.begin()->first);
		unsigned Vec2VF = GetVF(VMIt->first);
		if (Vec1VF == Vec2VF) {
		// No need to resize the input vectors since they are of the same size, we
		// can shuffle them directly.
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = SecMask[I] + Vec1VF;
		}
		}
		Prev = Action(Mask, {ShuffleMask.begin()->first, VMIt->first});
		} else {
		// Vectors of different sizes - resize and reshuffle.
		std::pair<T *, bool> Res1 =
		ResizeAction(ShuffleMask.begin()->first, Mask);
		std::pair<T *, bool> Res2 = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (Mask[I] != UndefMaskElem) {
		assert(SecMask[I] == UndefMaskElem && "Multiple uses of scalars.");
		if (Res1.second)
		Mask[I] = I;
		} else if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = (Res2.second ? I : SecMask[I]) + VF;
		}
		}
		Prev = Action(Mask, {Res1.first, Res2.first});
		}
		VMIt = std::next(VMIt);
		}
		// Perform requested actions for the remaining masks/vectors.
		for (auto E = ShuffleMask.end(); VMIt != E; ++VMIt) {
		// Shuffle other input vectors, if any.
		std::pair<T *, bool> Res = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert((Mask[I] == UndefMaskElem \|\| IsBaseNotUndef) &&
		"Multiple uses of scalars.");
		Mask[I] = (Res.second ? I : SecMask[I]) + VF;
		} else if (Mask[I] != UndefMaskElem) {
		Mask[I] = I;
		}
		}
		Prev = Action(Mask, {Prev, Res.first});
		}
		return Prev;
		}

InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();		unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I];		TreeEntry &TE = *VectorizableTree[I];

InstructionCost C = getEntryCost(&TE, VectorizedVals);		InstructionCost C = getEntryCost(&TE, VectorizedVals);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
InstructionCost ExtractCost = 0;		InstructionCost ExtractCost = 0;
SmallVector<unsigned> VF;		SmallVector<MapVector<const TreeEntry *, SmallVector<int>>> ShuffleMasks;
SmallVector<SmallVector<int>> ShuffleMask;		SmallVector<std::pair<Value , const TreeEntry >> FirstUsers;
SmallVector<Value *> FirstUsers;
SmallVector<APInt> DemandedElts;		SmallVector<APInt> DemandedElts;
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!isa_and_nonnull<InsertElementInst>(EU.User) &&		if (!isa_and_nonnull<InsertElementInst>(EU.User) &&
!ExtractCostCalculated.insert(EU.Scalar).second)		!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
Show All 12 Lines	if (isa<ExtractElementInst>(EU.Scalar))
continue;		continue;

// If found user is an insertelement, do not calculate extract cost but try		// If found user is an insertelement, do not calculate extract cost but try
// to detect it as a final shuffled/identity match.		// to detect it as a final shuffled/identity match.
if (auto *VU = dyn_cast_or_null<InsertElementInst>(EU.User)) {		if (auto *VU = dyn_cast_or_null<InsertElementInst>(EU.User)) {
if (auto *FTy = dyn_cast<FixedVectorType>(VU->getType())) {		if (auto *FTy = dyn_cast<FixedVectorType>(VU->getType())) {
Optional<unsigned> InsertIdx = getInsertIndex(VU);		Optional<unsigned> InsertIdx = getInsertIndex(VU);
if (InsertIdx) {		if (InsertIdx) {
auto It = find_if(FirstUsers, [VU](Value V) {		const TreeEntry *ScalarTE = getTreeEntry(EU.Scalar);
return areTwoInsertFromSameBuildVector(VU,		auto *It =
cast<InsertElementInst>(V));		find_if(FirstUsers,
		[VU](const std::pair<Value , const TreeEntry > &Pair) {
		return areTwoInsertFromSameBuildVector(
		VU, cast<InsertElementInst>(Pair.first));
});		});
int VecId = -1;		int VecId = -1;
if (It == FirstUsers.end()) {		if (It == FirstUsers.end()) {
VF.push_back(FTy->getNumElements());		(void)ShuffleMasks.emplace_back();
ShuffleMask.emplace_back(VF.back(), UndefMaskElem);
// Find the insertvector, vectorized in tree, if any.		// Find the insertvector, vectorized in tree, if any.
Value *Base = VU;		Value *Base = VU;
while (auto *IEBase = dyn_cast<InsertElementInst>(Base)) {		while (auto *IEBase = dyn_cast<InsertElementInst>(Base)) {
// Build the mask for the vectorized insertelement instructions.		// Build the mask for the vectorized insertelement instructions.
if (const TreeEntry *E = getTreeEntry(IEBase)) {		if (const TreeEntry *E = getTreeEntry(IEBase)) {
VU = IEBase;		VU = IEBase;
		RKSimonUnsubmitted Not Done Reply Inline Actions Some of this is NFC - pre-commit to reduce the patch? RKSimon: Some of this is NFC - pre-commit to reduce the patch?
do {		do {
int Idx = E->findLaneForValue(Base);		int Idx = E->findLaneForValue(Base);
ShuffleMask.back()[Idx] = Idx;		SmallVectorImpl<int> &Mask = ShuffleMasks.back()[ScalarTE];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		Mask[Idx] = Idx;
		RKSimonUnsubmitted Not Done Reply Inline Actions IEBase is invariant to the inner do-while loop - should this still use Base? RKSimon: IEBase is invariant to the inner do-while loop - should this still use Base?
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check ABataev: I'll check
Base = cast<InsertElementInst>(Base)->getOperand(0);		Base = cast<InsertElementInst>(Base)->getOperand(0);
} while (E == getTreeEntry(Base));		} while (E == getTreeEntry(Base));
break;		break;
}		}
Base = cast<InsertElementInst>(Base)->getOperand(0);		Base = cast<InsertElementInst>(Base)->getOperand(0);
}		}
FirstUsers.push_back(VU);		FirstUsers.emplace_back(VU, ScalarTE);
DemandedElts.push_back(APInt::getZero(VF.back()));		DemandedElts.push_back(APInt::getZero(FTy->getNumElements()));
VecId = FirstUsers.size() - 1;		VecId = FirstUsers.size() - 1;
} else {		} else {
		if (isFirstInsertElement(VU, cast<InsertElementInst>(It->first)))
		It->first = VU;
VecId = std::distance(FirstUsers.begin(), It);		VecId = std::distance(FirstUsers.begin(), It);
}		}
int InIdx = *InsertIdx;		int InIdx = *InsertIdx;
ShuffleMask[VecId][InIdx] = EU.Lane;		SmallVectorImpl<int> &Mask = ShuffleMasks[VecId][ScalarTE];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		assert(Mask[InIdx] == UndefMaskElem &&
		"InsertElementInstruction used already.");
		Mask[InIdx] = EU.Lane;
DemandedElts[VecId].setBit(InIdx);		DemandedElts[VecId].setBit(InIdx);
		RKSimonUnsubmitted Not Done Reply Inline Actions Pulling out Idx is just a NFC - precommit? RKSimon: Pulling out Idx is just a NFC - precommit?
continue;		continue;
}		}
}		}
}		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
Show All 9 Lines	for (ExternalUser &EU : ExternalUses) {
} else {		} else {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

InstructionCost SpillCost = getSpillCost();		InstructionCost SpillCost = getSpillCost();
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
if (FirstUsers.size() == 1) {		auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask) {
int Limit = ShuffleMask.front().size() * 2;		InstructionCost C = 0;
if (!all_of(ShuffleMask.front(),		unsigned VF = Mask.size();
[Limit](int Idx) { return Idx < Limit; }) \|\|		unsigned VecVF = TE->getVectorFactor();
!ShuffleVectorInst::isIdentityMask(ShuffleMask.front())) {		if (VF != VecVF &&
InstructionCost C = TTI->getShuffleCost(		(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) \|\|
		(all_of(Mask,
		[VF](int Idx) { return Idx < 2 * static_cast<int>(VF); }) &&
		!ShuffleVectorInst::isIdentityMask(Mask)))) {
		SmallVector<int> OrigMask(VecVF, UndefMaskElem);
		std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
		OrigMask.begin());
		C = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc,		TTI::SK_PermuteSingleSrc,
cast<FixedVectorType>(FirstUsers.front()->getType()),		FixedVectorType::get(TE->getMainOp()->getType(), VecVF), OrigMask);
ShuffleMask.front());		LLVM_DEBUG(
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of insertelement external users "		<< " for final shuffle of insertelement external users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
<< "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
		return std::make_pair(TE, true);
}		}
InstructionCost InsertCost = TTI->getScalarizationOverhead(		return std::make_pair(TE, false);
cast<FixedVectorType>(FirstUsers.front()->getType()),		};
DemandedElts.front(), /Insert/ true, /Extract/ false);		// Calculate the cost of the reshuffled vectors, if any.
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost		for (int I = 0, E = FirstUsers.size(); I < E; ++I) {
<< " for insertelements gather.\n"		Value *Base = cast<Instruction>(FirstUsers[I].first)->getOperand(0);
<< "SLP: Current total cost = " << Cost << "\n");		unsigned VF = ShuffleMasks[I].begin()->second.size();
Cost -= InsertCost;		auto *FTy = FixedVectorType::get(
} else if (FirstUsers.size() >= 2) {		cast<VectorType>(FirstUsers[I].first->getType())->getElementType(), VF);
unsigned MaxVF = *std::max_element(VF.begin(), VF.end());		auto Vector = ShuffleMasks[I].takeVector();
// Combined masks of the first 2 vectors.		auto &&EstimateShufflesCost = [this, FTy,
SmallVector<int> CombinedMask(MaxVF, UndefMaskElem);		&Cost](ArrayRef<int> Mask,
copy(ShuffleMask.front(), CombinedMask.begin());		ArrayRef<const TreeEntry *> TEs) {
APInt CombinedDemandedElts = DemandedElts.front().zextOrSelf(MaxVF);		assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
auto *VecTy = FixedVectorType::get(		"Expected exactly 1 or 2 tree entries.");
		RKSimonUnsubmitted Not Done Reply Inline Actions It might make it cleaner if we pull out the lambda instead of embedding inside the performExtractsShuffleAction call? RKSimon: It might make it cleaner if we pull out the lambda instead of embedding inside the…
cast<VectorType>(FirstUsers.front()->getType())->getElementType(),		if (TEs.size() == 1) {
MaxVF);		int Limit = 2 * Mask.size();
for (int I = 0, E = ShuffleMask[1].size(); I < E; ++I) {		if (!all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) \|\|
if (ShuffleMask[1][I] != UndefMaskElem) {		!ShuffleVectorInst::isIdentityMask(Mask)) {
CombinedMask[I] = ShuffleMask[1][I] + MaxVF;
CombinedDemandedElts.setBit(I);
}
}
InstructionCost C =		InstructionCost C =
TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, CombinedMask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of vector node and external "		<< " for final shuffle of insertelement "
"insertelement users "		"external users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		TEs.front()->dump();
<< "SLP: Current total cost = " << Cost << "\n");		dbgs() << "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
InstructionCost InsertCost = TTI->getScalarizationOverhead(		}
VecTy, CombinedDemandedElts, /Insert/ true, /Extract/ false);		} else {
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost
<< " for insertelements gather.\n"
<< "SLP: Current total cost = " << Cost << "\n");
Cost -= InsertCost;
for (int I = 2, E = FirstUsers.size(); I < E; ++I) {
if (ShuffleMask[I].empty())
continue;
// Other elements - permutation of 2 vectors (the initial one and the
// next Ith incoming vector).
unsigned VF = ShuffleMask[I].size();
for (unsigned Idx = 0; Idx < VF; ++Idx) {
int Mask = ShuffleMask[I][Idx];
if (Mask != UndefMaskElem)
CombinedMask[Idx] = MaxVF + Mask;
else if (CombinedMask[Idx] != UndefMaskElem)
CombinedMask[Idx] = Idx;
}
for (unsigned Idx = VF; Idx < MaxVF; ++Idx)
if (CombinedMask[Idx] != UndefMaskElem)
CombinedMask[Idx] = Idx;
InstructionCost C =		InstructionCost C =
TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, CombinedMask);		TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of vector node and external "		<< " for final shuffle of vector node and external "
"insertelement users "		"insertelement users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		if (TEs.front()) { TEs.front()->dump(); } TEs.back()->dump();
		RKSimonUnsubmitted Not Done Reply Inline Actions Is this correct - not sure if you want braces here or just a newline RKSimon: Is this correct - not sure if you want braces here or just a newline
		ABataevAuthorUnsubmitted Done Reply Inline Actions Want to dump TreeEntries here, if possible, helps to get more info about the graph, rather than just an instruction. ABataev: Want to dump TreeEntries here, if possible, helps to get more info about the graph, rather than…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, did not understand the question. Everything is ok here, just the first element might be a `nullptr` ABataev: Ah, did not understand the question. Everything is ok here, just the first element might be a…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - but maybe put TEs.back()->dump(); on a separate line so it doesn't look like it should be: if (TEs.front()) { TEs.front()->dump(); TEs.back()->dump(); } I'm very surprised clang-format didn't catch this RKSimon: OK - but maybe put TEs.back()->dump(); on a separate line so it doesn't look like it should be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will enclose first dump into braces explicitly, clang-format wants to make them in one line. ABataev: Will enclose first dump into braces explicitly, clang-format wants to make them in one line.
		RKSimonUnsubmitted Not Done Reply Inline Actions That's scary - I guess because this is all inside the LLVM_DEBUG macro... RKSimon: That's scary - I guess because this is all inside the LLVM_DEBUG macro...
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, I believe so. ABataev: Yes, I believe so.
<< "SLP: Current total cost = " << Cost << "\n");		dbgs() << "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
		}
		return TEs.back();
		};
		(void)performExtractsShuffleAction<const TreeEntry>(
		makeMutableArrayRef(Vector.data(), Vector.size()), Base,
		[](const TreeEntry *E) { return E->getVectorFactor(); }, ResizeToVF,
		EstimateShufflesCost);
InstructionCost InsertCost = TTI->getScalarizationOverhead(		InstructionCost InsertCost = TTI->getScalarizationOverhead(
cast<FixedVectorType>(FirstUsers[I]->getType()), DemandedElts[I],		cast<FixedVectorType>(FirstUsers[I].first->getType()), DemandedElts[I],
/Insert/ true, /Extract/ false);		/Insert/ true, /Extract/ false);
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost
<< " for insertelements gather.\n"
<< "SLP: Current total cost = " << Cost << "\n");
Cost -= InsertCost;		Cost -= InsertCost;
}		}
}

#ifndef NDEBUG		#ifndef NDEBUG
SmallString<256> Str;		SmallString<256> Str;
{		{
raw_svector_ostream OS(Str);		raw_svector_ostream OS(Str);
OS << "SLP: Spill Cost = " << SpillCost << ".\n"		OS << "SLP: Spill Cost = " << SpillCost << ".\n"
<< "SLP: Extract Cost = " << ExtractCost << ".\n"		<< "SLP: Extract Cost = " << ExtractCost << ".\n"
<< "SLP: Total Cost = " << Cost << ".\n";		<< "SLP: Total Cost = " << Cost << ".\n";
▲ Show 20 Lines • Show All 4,828 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	define void @_foo(double %p1, double %p2, double %p3) #0 {			define void @_foo(double %p1, double %p2, double %p3) #0 {
	; CHECK-LABEL: @_foo(			; CHECK-LABEL: @_foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[MUL19:%.]] = fmul double [[P1:%.]], 1.638400e+04
	; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04			; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[MUL21:%.]] = fmul double [[P2:%.]], 1.638400e+04			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[T_0259:%.]] = phi double [ 0.000000e+00, [[BB1]] ], [ [[ADD27:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P3_ADDR_0258:%.]] = phi double [ [[ADD]], [[BB1]] ], [ [[ADD28:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> poison, double [[T_0259]], i32 0			; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])			; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> poison, double [[P3_ADDR_0258]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
				; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> poison, double [[TMP6]], i32 0
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[ADD27]] = fadd double [[MUL19]], [[T_0259]]			; CHECK-NEXT: [[TMP7]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[ADD28]] = fadd double [[MUL21]], [[P3_ADDR_0258]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	define void @_foo(double %p1, double %p2, double %p3) #0 {			define void @_foo(double %p1, double %p2, double %p3) #0 {
	; CHECK-LABEL: @_foo(			; CHECK-LABEL: @_foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[MUL19:%.]] = fmul double [[P1:%.]], 1.638400e+04
	; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04			; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[MUL21:%.]] = fmul double [[P2:%.]], 1.638400e+04			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[T_0259:%.]] = phi double [ 0.000000e+00, [[BB1]] ], [ [[ADD27:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P3_ADDR_0258:%.]] = phi double [ [[ADD]], [[BB1]] ], [ [[ADD28:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> undef, double [[T_0259]], i32 0			; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> undef, double [[TMP5]], i32 0
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])			; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> undef, double [[P3_ADDR_0258]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
				; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> undef, double [[TMP6]], i32 0
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[ADD27]] = fadd double [[MUL19]], [[T_0259]]			; CHECK-NEXT: [[TMP7]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[ADD28]] = fadd double [[MUL21]], [[P3_ADDR_0258]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extracts-with-undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[BODY:%.*]]			; CHECK-NEXT: br label [[BODY:%.*]]
	; CHECK: body:			; CHECK: body:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x double> [ zeroinitializer, [[ENTRY:%.]] ], [ zeroinitializer, [[BODY]] ]			; CHECK-NEXT: [[PHI1:%.]] = phi double [ 0.000000e+00, [[ENTRY:%.]] ], [ 0.000000e+00, [[BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[TMP0]], i32 1			; CHECK-NEXT: [[PHI2:%.*]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ 0.000000e+00, [[BODY]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[MUL_I478_I:%.*]] = fmul fast double [[PHI1]], 0.000000e+00
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[MUL7_I485_I:%.*]] = fmul fast double undef, 0.000000e+00
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0			; CHECK-NEXT: [[ADD8_I_I:%.*]] = fadd fast double [[MUL_I478_I]], [[MUL7_I485_I]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
	; CHECK-NEXT: [[ADD8_I_I:%.*]] = fadd fast double [[TMP5]], [[TMP4]]
	; CHECK-NEXT: [[CMP42_I:%.*]] = fcmp fast ole double [[ADD8_I_I]], 0.000000e+00			; CHECK-NEXT: [[CMP42_I:%.*]] = fcmp fast ole double [[ADD8_I_I]], 0.000000e+00
	; CHECK-NEXT: br i1 false, label [[BODY]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 false, label [[BODY]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: br i1 false, label [[IF_THEN135_I:%.]], label [[IF_END209_I:%.]]			; CHECK-NEXT: br i1 false, label [[IF_THEN135_I:%.]], label [[IF_END209_I:%.]]
	; CHECK: if.then135.i:			; CHECK: if.then135.i:
	; CHECK-NEXT: [[TMP6:%.*]] = fcmp fast olt <2 x double> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[CMP145_I:%.*]] = fcmp fast olt double [[PHI1]], 0.000000e+00
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP6]], i32 0			; CHECK-NEXT: [[CMP152_I:%.*]] = fcmp fast olt double [[PHI2]], 0.000000e+00
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i1> <i1 poison, i1 false>, i1 [[TMP7]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i1> <i1 poison, i1 false>, i1 [[CMP152_I]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP8]], <2 x double> zeroinitializer, <2 x double> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> [[TMP0]], <2 x double> zeroinitializer, <2 x double> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <2 x double> zeroinitializer, [[TMP9]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x double> zeroinitializer, [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul fast <2 x double> [[TMP10]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP11]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], zeroinitializer
	; CHECK-NEXT: br label [[IF_END209_I]]			; CHECK-NEXT: br label [[IF_END209_I]]
	; CHECK: if.end209.i:			; CHECK: if.end209.i:
	; CHECK-NEXT: [[TMP13:%.*]] = phi <2 x double> [ [[TMP12]], [[IF_THEN135_I]] ], [ zeroinitializer, [[EXIT]] ]			; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x double> [ [[TMP4]], [[IF_THEN135_I]] ], [ zeroinitializer, [[EXIT]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %body			br label %body

	body:			body:
	%phi1 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]			%phi1 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]
	%phi2 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]			%phi2 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]
	Show All 27 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Further improvement of the cost model for scalars used in buildvectors.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 427293

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/extracts-with-undefs.ll

[SLP]Further improvement of the cost model for scalars used in buildvectors.
ClosedPublic