Page MenuHomePhabricator

[COST]Improve cost model for shuffles in SLP.
ClosedPublic

Authored by ABataev on Apr 14 2021, 8:19 AM.

Details

Summary

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks for looking at this - this is well overdue!

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
1229

We already have something similar in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE - do you think we could have a single version of the code some place?

ABataev added inline comments.Jul 15 2021, 5:39 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
1229

Will check if can merge it somehow.

ABataev updated this revision to Diff 362162.Jul 27 2021, 1:22 PM

Rebase + address comments.

Currently in hangs the compiler in tests test/CodeGen/Thumb2/mve-vst4.ll and llvm/test/CodeGen/Thumb2/mve-vst3.ll, need to investigate.

I hadn't been expecting you'd have to change the legalization implemention, at least not as an initial patch - can we not keep closer to the original SplitVecRes_VECTOR_SHUFFLE implementation?

I hadn't been expecting you'd have to change the legalization implemention, at least not as an initial patch - can we not keep closer to the original SplitVecRes_VECTOR_SHUFFLE implementation?

Do you mean, use extract/inserts in case if we have more than 2 input vectors?

ABataev updated this revision to Diff 362526.Jul 28 2021, 1:49 PM

Rebase + address comments.

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Anything else here?

Sorry for the slow response on this one - please can you split the DAG legalization change out as the first patch and the TTI change as a followup?

llvm/include/llvm/Analysis/VectorUtils.h
401

Description comment?

402

Should this be splitShuffleMasks?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2488

I'm rather surprised this didn't reduce further (same for the TTI) - maybe there's more functionality that can be moved into the computeShuffleMasks helper?

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 1:58 PM
RKSimon added inline comments.Apr 20 2022, 2:23 PM
llvm/test/Analysis/CostModel/X86/reduction.ll
64

Something might be still going wrong here - SSE max legal type is v4i32 so this cost should be 0 as its referencing a single existing vector

ABataev added inline comments.Apr 20 2022, 2:35 PM
llvm/test/Analysis/CostModel/X86/reduction.ll
64

I'm kind of pessimistic here and if the dest reg is not the same as src, I consider it as a reg copy and add cost TCC_Basic.

RKSimon accepted this revision.Tue, Apr 26, 5:09 AM

LGTM

This revision is now accepted and ready to land.Tue, Apr 26, 5:09 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Thu, Apr 28, 6:56 AM

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

fhahn added a comment.Thu, Apr 28, 7:02 AM

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

Ok, the reproducer should help.

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

Could you just share your compile options for the source file compilation?

fhahn added a comment.Thu, Apr 28, 7:28 AM

Reproducer below crashes with bin/opt -passes="print<cost-model>"

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin"

define void @test() {
entry:
  %matins.2.2 = shufflevector <9 x double> undef, <9 x double> undef, <9 x i32> <i32 0, i32 3, i32 6, i32 1, i32 4, i32 7, i32 2, i32 5, i32 8>
  ret void
}

Reproducer below crashes with bin/opt -passes="print<cost-model>"

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin"

define void @test() {
entry:
  %matins.2.2 = shufflevector <9 x double> undef, <9 x double> undef, <9 x i32> <i32 0, i32 3, i32 6, i32 1, i32 4, i32 7, i32 2, i32 5, i32 8>
  ret void
}

Thanks, will revert the patch to fix the bug.

Hi Alexey. Here is another crash reproducer:

; bin/opt -mcpu=corei7-avx -passes="print<cost-model>" -S test.ll

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local <12 x i64> @foo(<12 x i64> noundef %src) {
entry:

%shuffle = shufflevector <12 x i64> %src, <12 x i64> poison, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>
ret <12 x i64> %shuffle

}

Hi Alexey. Here is another crash reproducer:

; bin/opt -mcpu=corei7-avx -passes="print<cost-model>" -S test.ll

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local <12 x i64> @foo(<12 x i64> noundef %src) {
entry:

%shuffle = shufflevector <12 x i64> %src, <12 x i64> poison, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>
ret <12 x i64> %shuffle

}

Fixed in 371412e065a63107d5d79330da6757ff693d91cc

jgorbe added a subscriber: jgorbe.Thu, May 5, 11:14 AM

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

Here are the only differences I see for dynamic, 64x64; base is before the patch, experiment includes this - revert+recommit -, https://reviews.llvm.org/rG371412e065a63107d5d79330da6757ff693d91cc, https://reviews.llvm.org/D114171 and the NFCs.
Looks like the change is in the presence of poison:


Another 30-40% regression on a non-public benchmark was root-caused to https://reviews.llvm.org/D114171, so investigating that.

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

Here are the only differences I see for dynamic, 64x64; base is before the patch, experiment includes this - revert+recommit -, https://reviews.llvm.org/rG371412e065a63107d5d79330da6757ff693d91cc, https://reviews.llvm.org/D114171 and the NFCs.
Looks like the change is in the presence of poison:


Another 30-40% regression on a non-public benchmark was root-caused to https://reviews.llvm.org/D114171, so investigating that.

Hmm, I don't see the difference, actually, just changed the order of evaluation and lanes switched, no actual difference in IR. Most probably, the regression is caused by something else, maybe some lowering passes were tweaked for the previous order but not for the current.

Following up on the regression I was seeing with https://reviews.llvm.org/D114171, the only meaningful IR change seems to be:
Before the patch (after slp):

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%66 = insertelement <4 x float> %65, float %16, i32 2  
%67 = insertelement <4 x float> %66, float %15, i32 3
%68 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %67)
%69 = fcmp oeq <4 x float> %68, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%70 = freeze <4 x i1> %69 
%71 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %70)

After the patch (after slp)

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>  
%66 = shufflevector <4 x float> poison, <4 x float> %65, <4 x i32> <i32 4, i32 5, i32 2, i32 3> 
%67 = insertelement <4 x float> %66, float %16, i32 2  
%68 = insertelement <4 x float> %67, float %15, i32 3
%69 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %68)
%70 = fcmp oeq <4 x float> %69, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%71 = freeze <4 x i1> %70  
%72 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %71)

I'm not clear why the order of 0,1 in the shufflevector changed or the purpose of the second shufflevector adding poison on positions that will be overwritten, but the operations that follow seem to not affected by these changes.
Just providing an update, and not digging further into it at this time.

Following up on the regression I was seeing with https://reviews.llvm.org/D114171, the only meaningful IR change seems to be:
Before the patch (after slp):

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%66 = insertelement <4 x float> %65, float %16, i32 2  
%67 = insertelement <4 x float> %66, float %15, i32 3
%68 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %67)
%69 = fcmp oeq <4 x float> %68, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%70 = freeze <4 x i1> %69 
%71 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %70)

After the patch (after slp)

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>  
%66 = shufflevector <4 x float> poison, <4 x float> %65, <4 x i32> <i32 4, i32 5, i32 2, i32 3> 
%67 = insertelement <4 x float> %66, float %16, i32 2  
%68 = insertelement <4 x float> %67, float %15, i32 3
%69 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %68)
%70 = fcmp oeq <4 x float> %69, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%71 = freeze <4 x i1> %70  
%72 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %71)

I'm not clear why the order of 0,1 in the shufflevector changed or the purpose of the second shufflevector adding poison on positions that will be overwritten, but the operations that follow seem to not affected by these changes.
Just providing an update, and not digging further into it at this time.

I'm still working on the reductions improvements. Hope to fix all these regressions.