Page MenuHomePhabricator

[SLP]Further improvement of the cost model for scalars used in buildvectors.
ClosedPublic

Authored by ABataev on Dec 14 2021, 12:08 PM.

Details

Summary

Further improvement of the cost model for the scalars used in buildvectors sequences. The main functionality is outlined into a separate function.

The cost is calculated in the following way:

  1. If the Base vector is not undef vector, resizing the very first mask to have common VF and perform action for 2 input vectors (including non-undef Base). Other shuffle masks are combined with the resulting after the 1 stage and processed as a shuffle of 2 elements.
  2. If the Base is undef vector and have only 1 shuffle mask, perform the action only for 1 vector with the given mask, if it is not the identity mask.
  3. If > 2 masks are used, perform serie of shuffle actions for 2 vectors, combing the masks properly between the steps.

The original implementation misses the very first analysis for the Base vector, so the cost might too optimistic in some cases. But it improves the cost for the insertelements which are part of the current SLP graph.

Part of D107966.

Diff Detail

Event Timeline

ABataev created this revision.Dec 14 2021, 12:08 PM
ABataev requested review of this revision.Dec 14 2021, 12:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 12:08 PM
RKSimon edited the summary of this revision. (Show Details)Jan 3 2022, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Apr 29, 10:55 AM
Herald added a subscriber: vporpo. · View Herald Transcript

Anything we can do to simplify this patch would be great - there's a lot going on.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6329

Do you intend to use performExtractsShuffleAction more than once in the future? Otherwise some of these function_ref seem superfluous.

6473–6474

Some of this is NFC - pre-commit to reduce the patch?

6477–6480

IEBase is invariant to the inner do-while loop - should this still use Base?

6495–6502

Pulling out Idx is just a NFC - precommit?

ABataev added inline comments.Sun, May 1, 6:36 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6329

Yes, this patch is part of D107966, which unifies cost estimation and code emission.

6477–6480

I'll check

RKSimon added inline comments.Sun, May 1, 6:47 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6563

It might make it cleaner if we pull out the lambda instead of embedding inside the performExtractsShuffleAction call?

ABataev updated this revision to Diff 426506.Mon, May 2, 1:22 PM

Address comments + rebase.

RKSimon added inline comments.Tue, May 3, 6:24 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

Is this correct - not sure if you want braces here or just a newline

ABataev added inline comments.Tue, May 3, 8:37 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

Want to dump TreeEntries here, if possible, helps to get more info about the graph, rather than just an instruction.

ABataev added inline comments.Tue, May 3, 8:45 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

Ah, did not understand the question. Everything is ok here, just the first element might be a nullptr

RKSimon added inline comments.Tue, May 3, 8:47 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

OK - but maybe put TEs.back()->dump(); on a separate line so it doesn't look like it should be:

if (TEs.front()) { TEs.front()->dump(); TEs.back()->dump(); }

I'm very surprised clang-format didn't catch this

ABataev added inline comments.Tue, May 3, 8:53 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

Will enclose first dump into braces explicitly, clang-format wants to make them in one line.

RKSimon added inline comments.Tue, May 3, 9:04 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

That's scary - I guess because this is all inside the LLVM_DEBUG macro...

ABataev updated this revision to Diff 426768.Tue, May 3, 10:38 AM

Rebase + formatting

RKSimon accepted this revision.Wed, May 4, 8:59 AM

LGTM

This revision is now accepted and ready to land.Wed, May 4, 8:59 AM
This revision was landed with ongoing or failed builds.Thu, May 5, 6:06 AM
This revision was automatically updated to reflect the committed changes.

This caused (or exposed?) failing asserts:

$ cat sqrtf.c
float *a;
float c;
float sqrtf(float);
void b() {
  float d, e, f, g;
  d = c * (0 + 2 * sqrtf(c) * 0);
  e = c * (0 - (c + 1) * 0);
  f = c * (0 - 2 * sqrtf(c) * 0);
  g = 2 * (0 + (c + 1) * 0);
  a[0] = d / 0;
  a[1] = e / 0;
  a[2] = f / 0;
  a[3] = g / 0;
}
$ clang -target x86_64-linux-gnu -c -O2 -fno-math-errno sqrtf.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:6592: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

This caused (or exposed?) failing asserts:

$ cat sqrtf.c
float *a;
float c;
float sqrtf(float);
void b() {
  float d, e, f, g;
  d = c * (0 + 2 * sqrtf(c) * 0);
  e = c * (0 - (c + 1) * 0);
  f = c * (0 - 2 * sqrtf(c) * 0);
  g = 2 * (0 + (c + 1) * 0);
  a[0] = d / 0;
  a[1] = e / 0;
  a[2] = f / 0;
  a[3] = g / 0;
}
$ clang -target x86_64-linux-gnu -c -O2 -fno-math-errno sqrtf.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:6592: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

Hi, thanks for the report, feel free to revert, I'll fix it tomorrow.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6583

Yes, I believe so.

repro:

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "fuzz.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local void @b() local_unnamed_addr #0 {
entry:
  %mul = fmul float undef, 2.000000e+00
  %i = tail call float @llvm.fmuladd.f32(float %mul, float 0.000000e+00, float 0.000000e+00)
  %mul2 = fmul float undef, %i
  %add = fadd float undef, 1.000000e+00
  %neg = fneg float %add
  %i1 = tail call float @llvm.fmuladd.f32(float %neg, float 0.000000e+00, float 0.000000e+00)
  %mul4 = fmul float undef, %i1
  %neg7 = fneg float %mul
  %i2 = tail call float @llvm.fmuladd.f32(float %neg7, float 0.000000e+00, float 0.000000e+00)
  %mul8 = fmul float undef, %i2
  %i3 = tail call float @llvm.fmuladd.f32(float %add, float 0.000000e+00, float 0.000000e+00)
  %mul11 = fmul float %i3, 2.000000e+00
  %div = fdiv float %mul2, 0.000000e+00
  store float %div, ptr undef, align 4
  %div12 = fdiv float %mul4, 0.000000e+00
  %arrayidx13 = getelementptr inbounds float, ptr undef, i64 1
  store float %div12, ptr %arrayidx13, align 4
  %div14 = fdiv float %mul8, 0.000000e+00
  %arrayidx15 = getelementptr inbounds float, ptr undef, i64 2
  store float %div14, ptr %arrayidx15, align 4
  %div16 = fdiv float %mul11, 0.000000e+00
  %arrayidx17 = getelementptr inbounds float, ptr undef, i64 3
  store float %div16, ptr %arrayidx17, align 4
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.fmuladd.f32(float, float, float) #1

attributes #0 = { "tune-cpu"="generic" }
attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }
ronlieb added a subscriber: ronlieb.Sun, May 8, 8:32 PM

i am seeing the same assert on the amdgpu target

the reproducer posted by previous two responders is way smaller than what i have

sberg added a subscriber: sberg.Mon, May 9, 11:53 AM

still fails for me, see https://github.com/llvm/llvm-project/issues/55359 "'llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6596: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.' since https://github.com/llvm/llvm-project/commit/99f31acfce338417fea3c14983d6f8fedc8ed043 '[SLP]Further improvement of the cost model for scalars used in buildvectors.'" (which I filed so that I could attach the non-reduced reproducer there)

another issue that repros at head

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
  br label %1

1:                                                ; preds = %1, %0
  %2 = fadd float 0.000000e+00, 0.000000e+00
  %3 = fadd float 0.000000e+00, 0.000000e+00
  %4 = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %5 = insertelement <2 x float> %4, float %3, i64 1
  %6 = insertelement <2 x float> %5, float %2, i64 0
  br label %1
}
$ ./build/rel/bin/opt -passes=slp-vectorizer -disable-output /tmp/a.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6500: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

(I did see a different instance of this crash get fixed with some other commit but this one still remains)

another issue that repros at head

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
  br label %1

1:                                                ; preds = %1, %0
  %2 = fadd float 0.000000e+00, 0.000000e+00
  %3 = fadd float 0.000000e+00, 0.000000e+00
  %4 = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %5 = insertelement <2 x float> %4, float %3, i64 1
  %6 = insertelement <2 x float> %5, float %2, i64 0
  br label %1
}
$ ./build/rel/bin/opt -passes=slp-vectorizer -disable-output /tmp/a.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6500: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getTreeCost(ArrayRef<llvm::Value *>): Assertion `Mask[InIdx] == UndefMaskElem && "InsertElementInstruction used already."' failed.

(I did see a different instance of this crash get fixed with some other commit but this one still remains)

Yes, have a fix for this issue already, need to relax an assert, will commit in couple minutes

Decided to revert this patch and several others to fix the bugs.

another crash even with https://reviews.llvm.org/rGcce80bd8b74d54deb82b1b6ae0cbec1ab53c1dbb

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
.lr.ph.i41:
  br label %.lr.ph.split.us.i

.lr.ph.split.us.i:                                ; preds = %.lr.ph.split.us.i, %.lr.ph.i41
  %0 = fadd float 0.000000e+00, 0.000000e+00
  %1 = fadd float 0.000000e+00, 0.000000e+00
  %2 = fadd float %0, 0.000000e+00
  %3 = fadd float %1, 0.000000e+00
  %.sroa.3.8.vec.insert.i.us.i = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %.sroa.3.12.vec.insert.i.us.i = insertelement <2 x float> %.sroa.3.8.vec.insert.i.us.i, float %3, i64 1
  %.sroa.025.4.vec.insert.us.i = insertelement <2 x float> %.sroa.3.12.vec.insert.i.us.i, float %0, i64 0
  br label %.lr.ph.split.us.i
}

another crash even with https://reviews.llvm.org/rGcce80bd8b74d54deb82b1b6ae0cbec1ab53c1dbb

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define internal void @0() {
.lr.ph.i41:
  br label %.lr.ph.split.us.i

.lr.ph.split.us.i:                                ; preds = %.lr.ph.split.us.i, %.lr.ph.i41
  %0 = fadd float 0.000000e+00, 0.000000e+00
  %1 = fadd float 0.000000e+00, 0.000000e+00
  %2 = fadd float %0, 0.000000e+00
  %3 = fadd float %1, 0.000000e+00
  %.sroa.3.8.vec.insert.i.us.i = insertelement <2 x float> zeroinitializer, float %2, i64 0
  %.sroa.3.12.vec.insert.i.us.i = insertelement <2 x float> %.sroa.3.8.vec.insert.i.us.i, float %3, i64 1
  %.sroa.025.4.vec.insert.us.i = insertelement <2 x float> %.sroa.3.12.vec.insert.i.us.i, float %0, i64 0
  br label %.lr.ph.split.us.i
}

Thanks, will add it to fixed version of the patch

aeubanks added a comment.EditedMon, May 9, 1:46 PM

(if you're interested in testing the recommit, consider running opt -O2 on the attached file, that's where I've been pulling the crashes from)

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

Another corner case of too strict assertion I was afraid of. Will fix it ASAP.

Another crash caused by this:

$ cat /tmp/f.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: argmemonly nofree nounwind willreturn writeonly
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %11 = insertelement <2 x float> undef, float %3, i64 0
  %12 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  %13 = insertelement <2 x float> %11, float %6, i64 0
  store <2 x float> zeroinitializer, ptr null, align 4
  ret void
}

attributes #0 = { argmemonly nocallback nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nounwind willreturn writeonly }
$ bin/opt -passes=slp-vectorizer -disable-output /tmp/f.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6657: T *performExtractsShuffleAction(MutableArrayRef<std::pair<T *, SmallVector<int>>>, llvm::Value *, function_ref<unsigned int (T *)>
, function_ref<std::pair<T *, bool> (T *, ArrayRef<int>)>, function_ref<T *(ArrayRef<int>, ArrayRef<T *>)>) [T = const llvm::slpvectorizer::BoUpSLP::TreeEntry]: Assertion `Mask[I] == UndefMaskEle
m && "Multiple uses of scalars."' failed.

Fixed 85f6b15ee50f

another one :)

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %.sroa.0.0.vec.insert.i5.i10 = insertelement <2 x float> undef, float %3, i64 0
  %.sroa.0.4.vec.insert.i10.i13 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %8, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13, ptr null, align 4
  %.sroa.0.4.vec.insert.i10.i13.2 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %6, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13.2, ptr null, align 4
  ret void
}

another one :)

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-android23"

%0 = type { float, float, float, float }

define internal void @0() {
  %1 = getelementptr inbounds %0, ptr undef, i64 0, i32 2
  %2 = load float, ptr %1, align 4
  %3 = load float, ptr undef, align 4
  %4 = fsub float %2, %3
  %5 = getelementptr inbounds %0, ptr undef, i64 0, i32 3
  %6 = load float, ptr %5, align 4
  %7 = getelementptr inbounds %0, ptr undef, i64 0, i32 1
  %8 = load float, ptr %7, align 4
  %9 = fsub float %6, %8
  %10 = fcmp olt float %9, %4
  %.sroa.0.0.vec.insert.i5.i10 = insertelement <2 x float> undef, float %3, i64 0
  %.sroa.0.4.vec.insert.i10.i13 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %8, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13, ptr null, align 4
  %.sroa.0.4.vec.insert.i10.i13.2 = insertelement <2 x float> %.sroa.0.0.vec.insert.i5.i10, float %6, i64 1
  store <2 x float> %.sroa.0.4.vec.insert.i10.i13.2, ptr null, align 4
  ret void
}

Fixed in 152072801e24fb1e5cd962b0cb089230bc27b6b9