This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/8
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
transpose-inseltpoison.ll
-
transpose.ll
-
X86/
-
cmp_commute-inseltpoison.ll
-
cmp_commute.ll
-
crash_cmpop.ll
-
crash_exceed_scheduling.ll
-
crash_lencod.ll
-
crash_scheduling-inseltpoison.ll
-
crash_scheduling.ll
-
crash_smallpt.ll
-
extractelement.ll
-
extracts-with-undefs.ll
-
horizontal-minmax.ll
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
insert-shuffle.ll
-
jumbled-load-multiuse.ll
-
jumbled-load.ll
-
jumbled_store_crash.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
ordering-bug.ll
-
phi.ll
-
pr42022-inseltpoison.ll
1
pr42022.ll
2/4
remark_extract_broadcast.ll
-
vec_list_bias-inseltpoison.ll
-
vec_list_bias.ll
-
vectorize-widest-phis.ll

Differential D107966

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
ClosedPublic

Authored by ABataev on Aug 12 2021, 8:26 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
anton-afanasyev
dtemirbulatov

Commits

rG2ac5ebedeac4: [SLP]Do not emit extract elements for insertelements users, replace with…
rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with…

Summary

SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Aug 12 2021, 8:26 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 12 2021, 8:26 AM

ABataev requested review of this revision.Aug 12 2021, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 8:26 AM

Harbormaster completed remote builds in B119264: Diff 366002.Aug 12 2021, 8:54 AM

Rebase

ABataev mentioned this in D108703: [SLP]No need to schedule/check parent for extract{element/value} instruction..Aug 25 2021, 9:54 AM

Harbormaster completed remote builds in B121185: Diff 368666.Aug 25 2021, 10:10 AM

Rebased. Checked that the test SLPVectorizer/X86/remark_extract_broadcast.ll (mentioned in D108703) is updated as requested.

lebedev.ri added a subscriber: lebedev.ri.Aug 26 2021, 5:30 AM

lebedev.ri added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23	Thanks! This is clearly an improvement, but these two shuffles are still clearly redundant, because in either case, you end up with 0'th element of `LD` in some elements of output. In this case you could simply drop the first shuffle, and do the second one directly.

ABataev added inline comments.Aug 26 2021, 5:34 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23	I think, in codegen the first shuffle will be simply dropped (this is an identity shuffle). But I'll check what can be improved here.

lebedev.ri added inline comments.Aug 26 2021, 5:37 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23	Ignoring more complicated cases, perhaps the key point here is that the `TMP0` is an identity (=>single-source), non-width-changing shuffle, so it can be naturally dropped. `ShuffleVectorInst::isIdentityMask()` might be relevant.

ABataev added inline comments.Aug 26 2021, 5:40 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23	Agree. Will check what can be done here to improve it.

Harbormaster completed remote builds in B121328: Diff 368855.Aug 26 2021, 6:25 AM

Address comments

Harbormaster completed remote builds in B121350: Diff 368885.Aug 26 2021, 8:48 AM

Rebase

Harbormaster completed remote builds in B124685: Diff 373619.Sep 20 2021, 9:54 AM

Please can you rebase?

In D107966#3030773, @RKSimon wrote:

Please can you rebase?

Sure, will do, just need to finish my work with other patches.

Rebase

Harbormaster completed remote builds in B126541: Diff 376175.Sep 30 2021, 6:56 AM

Rebase

Harbormaster completed remote builds in B126866: Diff 376954.Oct 4 2021, 12:07 PM

Rebase

Harbormaster completed remote builds in B130809: Diff 382455.Oct 26 2021, 2:21 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:58 PM

Rebase

Harbormaster completed remote builds in B136686: Diff 390691.Nov 30 2021, 7:23 AM

RKSimon added inline comments.Dec 1 2021, 8:44 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5744	The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to simplify this patch.

rebase after D114909?

Rebase

Harbormaster completed remote builds in B137991: Diff 392525.Dec 7 2021, 2:40 PM

Rebase

Harbormaster completed remote builds in B138520: Diff 393275.Dec 9 2021, 2:16 PM

Rebase

Harbormaster completed remote builds in B139070: Diff 394037.Dec 13 2021, 2:47 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 7:24 AM

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
222	These look like NFC changes by the update script that can probably be pre-comitted to reduce the patch?

It looks OK - but its a LOT of dense code which makes it very difficult to grok - better comments and possibly a simplification pass might be a good idea

Rebase

In D107966#3192290, @RKSimon wrote:

It looks OK - but its a LOT of dense code which makes it very difficult to grok - better comments and possibly a simplification pass might be a good idea

Will try to split it.

Harbormaster completed remote builds in B139221: Diff 394247.Dec 14 2021, 8:25 AM

ABataev mentioned this in D115750: [SLP]Further improvement of the cost model for scalars used in buildvectors..Dec 14 2021, 12:08 PM

Rebase

Harbormaster completed remote builds in B140901: Diff 396532.Dec 29 2021, 8:11 AM

ABataev mentioned this in rG99f31acfce33: [SLP]Further improvement of the cost model for scalars used in buildvectors..May 5 2022, 6:06 AM

rebase? not sure how big this is now

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 9:36 AM

In D107966#3494274, @RKSimon wrote:

rebase? not sure how big this is now

Working on it.

ABataev mentioned this in rGf5d45d70a511: [SLP]Further improvement of the cost model for scalars used in buildvectors..May 11 2022, 6:09 AM

Rebase

Harbormaster completed remote builds in B163936: Diff 428704.May 11 2022, 1:04 PM

RKSimon added inline comments.May 13 2022, 7:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7116	I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to match IEBase? we break after the if() above so we can't get here from there.

ABataev added inline comments.May 13 2022, 7:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7116	We just iterate through insertelements, which are not part of the vectorized buildvector. For example: %0 = insertelement %..., %a, 0 %1 = insertelement %0, %b, 1 %2 = insertelement %1, %c, 2 If %c is vectorized, we start looking through a buildvectror, trying to find the vectorized base. Start from %2. getTreeEntry(%2) returns nullptr. Go to %1. getTreeEntry(%1) returns nullptr too (it is not a part of vectorized buildvector). Go to %0. getTreeEntry(%0) is vectorized and returns E. Iterate through all vectorized insertelements, build a mask. Put %2 to the list of insertelements, which must be transformed to shuffles. Later, we do the analysis of all inserts between %1-%2 (including boundaries), If they must be replaced with shuffles - replace them with shuffles, other insertelements remain as is, just change their base properly to the shuffles.

Rebase

Harbormaster completed remote builds in B164394: Diff 429354.May 13 2022, 4:57 PM

LGTM

This revision is now accepted and ready to land.May 14 2022, 2:40 AM

This revision was landed with ongoing or failed builds.May 20 2022, 6:00 AM

Closed by commit rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with… (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with….

It looks like this patch is causing SLPVectorizer to crash with the following IR. This blocks building SPEC on X86, so I'll go ahead and revert this for now to unblock testing.

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i64 @foo(ptr %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = sub i32 undef, undef
  %tmp2 = sub nsw i32 undef, %tmp
  %tmp3 = add i32 undef, %tmp2
  %tmp4 = xor i32 %tmp3, undef
  %tmp5 = add i32 undef, %tmp4
  %tmp6 = sub i32 undef, undef
  %tmp7 = load i32, ptr undef, align 4
  %tmp8 = sub i32 %tmp7, undef
  %tmp9 = sub nsw i32 0, undef
  %tmp10 = add nsw i32 %tmp8, %tmp6
  %tmp11 = sub nsw i32 %tmp6, %tmp8
  %tmp12 = add i32 undef, %tmp10
  %tmp13 = xor i32 %tmp12, undef
  %tmp14 = add i32 undef, %tmp9
  %tmp15 = xor i32 %tmp14, undef
  %tmp16 = add i32 undef, %tmp11
  %tmp17 = xor i32 %tmp16, undef
  %tmp18 = add i32 %tmp13, %tmp5
  %tmp19 = add i32 %tmp18, undef
  %tmp20 = add i32 %tmp19, %tmp15
  %tmp21 = add i32 %tmp20, %tmp17
  %tmp22 = sub i32 undef, undef
  %tmp23 = add i32 undef, undef
  %tmp24 = sub i32 undef, undef
  %tmp25 = add nsw i32 %tmp23, undef
  %tmp26 = add nsw i32 %tmp24, %tmp22
  %tmp27 = sub nsw i32 %tmp22, %tmp24
  %tmp28 = add i32 undef, %tmp25
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 undef, %tmp27
  %tmp33 = xor i32 %tmp32, undef
  %tmp34 = add i32 %tmp31, %tmp21
  %tmp35 = add i32 %tmp34, %tmp29
  %tmp36 = add i32 %tmp35, undef
  %tmp37 = add i32 %tmp36, %tmp33
  %tmp38 = sub nsw i32 undef, undef
  %tmp39 = add i32 undef, %tmp38
  %tmp40 = xor i32 %tmp39, undef
  %tmp41 = add i32 undef, %tmp37
  %tmp42 = add i32 %tmp41, 0
  %tmp43 = add i32 %tmp42, %tmp40
  %tmp44 = add i32 %tmp43, undef
  %tmp45 = add i32 undef, %tmp44
  %tmp46 = add i32 %tmp45, undef
  %tmp47 = add i32 %tmp46, undef
  %tmp48 = add i32 %tmp47, 0
  %tmp49 = add i32 undef, %tmp48
  %tmp50 = add i32 %tmp49, undef
  %tmp51 = add i32 %tmp50, undef
  %tmp52 = add i32 %tmp51, 0
  %tmp53 = add i32 undef, %tmp52
  %tmp54 = add i32 %tmp53, undef
  %tmp55 = add i32 %tmp54, undef
  %tmp56 = add i32 %tmp55, 0
  %tmp57 = add i32 0, %tmp56
  %tmp58 = add i32 %tmp57, 0
  %tmp59 = add i32 %tmp58, 0
  %tmp60 = add i32 %tmp59, 0
  %tmp61 = lshr i32 %tmp60, 16
  %tmp62 = add nuw nsw i32 undef, %tmp61
  %tmp63 = sub nsw i32 %tmp62, undef
  %tmp64 = zext i32 %tmp63 to i64
  %tmp65 = shl nuw i64 %tmp64, 32
  %tmp66 = add i64 %tmp65, undef
  ret i64 %tmp66
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

fhahn added a reverting change: rGaeb19817d66f: Revert "[SLP]Do not emit extract elements for insertelements users, replace….May 21 2022, 1:01 PM

ABataev added a commit: rG2ac5ebedeac4: [SLP]Do not emit extract elements for insertelements users, replace with….May 23 2022, 7:09 AM

Unfortunately the latest version is still causing crashes when build SPEC2017 on X86. Reproducer below:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

%struct.hoge = type { [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [4 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i8*, i32, i8*, i32)*], i32 (i8*, i32, i8*, i32, i32*)*, [4 x i64 (i8*, i32)*], [4 x i64 (i8*, i32)*], void (i8*, i32, i8*, i32, [4 x i32]*)*, float ([4 x i32]*, [4 x i32]*, i32)*, [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i32*, i16*, i32, i16*, i16*, i32, i32)*], void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)* }

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = add i32 undef, undef
  %tmp2 = sub i32 undef, undef
  %tmp3 = add i32 undef, undef
  %tmp4 = sub i32 undef, undef
  %tmp5 = add nsw i32 %tmp3, %tmp
  %tmp6 = sub nsw i32 %tmp, %tmp3
  %tmp7 = add nsw i32 %tmp4, %tmp2
  %tmp8 = sub nsw i32 %tmp2, %tmp4
  %tmp9 = add i32 undef, %tmp5
  %tmp10 = xor i32 %tmp9, undef
  %tmp11 = add i32 undef, %tmp7
  %tmp12 = xor i32 %tmp11, undef
  %tmp13 = add i32 undef, %tmp6
  %tmp14 = xor i32 %tmp13, undef
  %tmp15 = add i32 undef, %tmp8
  %tmp16 = xor i32 %tmp15, undef
  %tmp17 = add i32 %tmp12, %tmp10
  %tmp18 = add i32 %tmp17, %tmp14
  %tmp19 = add i32 %tmp18, %tmp16
  %tmp20 = add i32 undef, undef
  %tmp21 = sub i32 undef, undef
  %tmp22 = add i32 undef, undef
  %tmp23 = sub i32 undef, undef
  %tmp24 = add nsw i32 %tmp22, %tmp20
  %tmp25 = sub nsw i32 %tmp20, %tmp22
  %tmp26 = add nsw i32 %tmp23, %tmp21
  %tmp27 = sub nsw i32 %tmp21, %tmp23
  %tmp28 = add i32 undef, %tmp24
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 0, %tmp25
  %tmp33 = xor i32 %tmp32, 0
  %tmp34 = add i32 undef, %tmp27
  %tmp35 = xor i32 %tmp34, undef
  %tmp36 = add i32 %tmp31, %tmp19
  %tmp37 = add i32 %tmp36, %tmp29
  %tmp38 = add i32 %tmp37, %tmp33
  %tmp39 = add i32 %tmp38, %tmp35
  %tmp40 = add i32 undef, undef
  %tmp41 = sub i32 undef, undef
  %tmp42 = add i32 undef, undef
  %tmp43 = sub i32 undef, undef
  %tmp44 = add nsw i32 %tmp42, %tmp40
  %tmp45 = sub nsw i32 %tmp40, %tmp42
  %tmp46 = add nsw i32 %tmp43, %tmp41
  %tmp47 = sub nsw i32 %tmp41, %tmp43
  %tmp48 = add i32 undef, %tmp44
  %tmp49 = xor i32 %tmp48, undef
  %tmp50 = add i32 undef, %tmp46
  %tmp51 = xor i32 %tmp50, undef
  %tmp52 = add i32 undef, %tmp45
  %tmp53 = xor i32 %tmp52, undef
  %tmp54 = add i32 undef, %tmp47
  %tmp55 = xor i32 %tmp54, undef
  %tmp56 = add i32 %tmp51, %tmp39
  %tmp57 = add i32 %tmp56, %tmp49
  %tmp58 = add i32 %tmp57, %tmp53
  %tmp59 = add i32 %tmp58, %tmp55
  %tmp60 = load i32, i32* undef, align 4
  %tmp61 = add i32 undef, %tmp60
  %tmp62 = sub i32 %tmp60, undef
  %tmp63 = add i32 undef, undef
  %tmp64 = sub i32 undef, undef
  %tmp65 = add nsw i32 %tmp63, %tmp61
  %tmp66 = sub nsw i32 %tmp61, %tmp63
  %tmp67 = add nsw i32 %tmp64, %tmp62
  %tmp68 = sub nsw i32 %tmp62, %tmp64
  %tmp69 = add i32 undef, %tmp65
  %tmp70 = xor i32 %tmp69, undef
  %tmp71 = add i32 undef, %tmp67
  %tmp72 = xor i32 %tmp71, undef
  %tmp73 = add i32 undef, %tmp66
  %tmp74 = xor i32 %tmp73, undef
  %tmp75 = add i32 undef, %tmp68
  %tmp76 = xor i32 %tmp75, undef
  %tmp77 = add i32 %tmp72, %tmp59
  %tmp78 = add i32 %tmp77, %tmp70
  %tmp79 = add i32 %tmp78, %tmp74
  %tmp80 = add i32 %tmp79, %tmp76
  %tmp81 = add i32 undef, undef
  %tmp82 = sub i32 undef, undef
  %tmp83 = add i32 undef, undef
  %tmp84 = sub i32 undef, undef
  %tmp85 = add nsw i32 %tmp83, %tmp81
  %tmp86 = sub nsw i32 %tmp81, %tmp83
  %tmp87 = add nsw i32 %tmp84, %tmp82
  %tmp88 = sub nsw i32 %tmp82, %tmp84
  %tmp89 = add i32 undef, %tmp85
  %tmp90 = xor i32 %tmp89, undef
  %tmp91 = add i32 undef, %tmp87
  %tmp92 = xor i32 %tmp91, undef
  %tmp93 = add i32 undef, %tmp86
  %tmp94 = xor i32 %tmp93, undef
  %tmp95 = add i32 undef, %tmp88
  %tmp96 = xor i32 %tmp95, undef
  %tmp97 = add i32 %tmp92, %tmp80
  %tmp98 = add i32 %tmp97, %tmp90
  %tmp99 = add i32 %tmp98, %tmp94
  %tmp100 = add i32 %tmp99, %tmp96
  %tmp101 = add i32 undef, undef
  %tmp102 = sub i32 undef, undef
  %tmp103 = add i32 undef, undef
  %tmp104 = sub i32 undef, undef
  %tmp105 = add nsw i32 %tmp103, %tmp101
  %tmp106 = sub nsw i32 %tmp101, %tmp103
  %tmp107 = add nsw i32 %tmp104, %tmp102
  %tmp108 = sub nsw i32 %tmp102, %tmp104
  %tmp109 = add i32 undef, %tmp105
  %tmp110 = xor i32 %tmp109, undef
  %tmp111 = add i32 undef, %tmp107
  %tmp112 = xor i32 %tmp111, undef
  %tmp113 = add i32 undef, %tmp106
  %tmp114 = xor i32 %tmp113, undef
  %tmp115 = add i32 undef, %tmp108
  %tmp116 = xor i32 %tmp115, undef
  %tmp117 = add i32 %tmp112, %tmp100
  %tmp118 = add i32 %tmp117, %tmp110
  %tmp119 = add i32 %tmp118, %tmp114
  %tmp120 = add i32 %tmp119, %tmp116
  %tmp121 = add i32 undef, undef
  %tmp122 = sub i32 undef, undef
  %tmp123 = add i32 undef, undef
  %tmp124 = sub i32 undef, undef
  %tmp125 = add nsw i32 %tmp123, %tmp121
  %tmp126 = sub nsw i32 %tmp121, %tmp123
  %tmp127 = add nsw i32 %tmp124, %tmp122
  %tmp128 = sub nsw i32 %tmp122, %tmp124
  %tmp129 = add i32 undef, %tmp125
  %tmp130 = xor i32 %tmp129, undef
  %tmp131 = add i32 undef, %tmp127
  %tmp132 = xor i32 %tmp131, undef
  %tmp133 = add i32 undef, %tmp126
  %tmp134 = xor i32 %tmp133, undef
  %tmp135 = add i32 undef, %tmp128
  %tmp136 = xor i32 %tmp135, undef
  %tmp137 = add i32 %tmp132, %tmp120
  %tmp138 = add i32 %tmp137, %tmp130
  %tmp139 = add i32 %tmp138, %tmp134
  %tmp140 = add i32 %tmp139, %tmp136
  %tmp141 = add i32 undef, undef
  %tmp142 = sub i32 undef, undef
  %tmp143 = add i32 undef, undef
  %tmp144 = sub i32 undef, undef
  %tmp145 = add nsw i32 %tmp143, %tmp141
  %tmp146 = sub nsw i32 %tmp141, %tmp143
  %tmp147 = add nsw i32 %tmp144, %tmp142
  %tmp148 = sub nsw i32 %tmp142, %tmp144
  %tmp149 = add i32 undef, %tmp145
  %tmp150 = xor i32 %tmp149, undef
  %tmp151 = add i32 undef, %tmp147
  %tmp152 = xor i32 %tmp151, undef
  %tmp153 = add i32 undef, %tmp146
  %tmp154 = xor i32 %tmp153, undef
  %tmp155 = add i32 undef, %tmp148
  %tmp156 = xor i32 %tmp155, undef
  %tmp157 = add i32 %tmp152, %tmp140
  %tmp158 = add i32 %tmp157, %tmp150
  %tmp159 = add i32 %tmp158, %tmp154
  %tmp160 = add i32 %tmp159, %tmp156
  %tmp161 = and i32 %tmp160, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

In D107966#3533620, @fhahn wrote:

Unfortunately the latest version is still causing crashes when build SPEC2017 on X86. Reproducer below:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

%struct.hoge = type { [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [4 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i8*, i32, i8*, i32)*], i32 (i8*, i32, i8*, i32, i32*)*, [4 x i64 (i8*, i32)*], [4 x i64 (i8*, i32)*], void (i8*, i32, i8*, i32, [4 x i32]*)*, float ([4 x i32]*, [4 x i32]*, i32)*, [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i32*, i16*, i32, i16*, i16*, i32, i32)*], void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)* }

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = add i32 undef, undef
  %tmp2 = sub i32 undef, undef
  %tmp3 = add i32 undef, undef
  %tmp4 = sub i32 undef, undef
  %tmp5 = add nsw i32 %tmp3, %tmp
  %tmp6 = sub nsw i32 %tmp, %tmp3
  %tmp7 = add nsw i32 %tmp4, %tmp2
  %tmp8 = sub nsw i32 %tmp2, %tmp4
  %tmp9 = add i32 undef, %tmp5
  %tmp10 = xor i32 %tmp9, undef
  %tmp11 = add i32 undef, %tmp7
  %tmp12 = xor i32 %tmp11, undef
  %tmp13 = add i32 undef, %tmp6
  %tmp14 = xor i32 %tmp13, undef
  %tmp15 = add i32 undef, %tmp8
  %tmp16 = xor i32 %tmp15, undef
  %tmp17 = add i32 %tmp12, %tmp10
  %tmp18 = add i32 %tmp17, %tmp14
  %tmp19 = add i32 %tmp18, %tmp16
  %tmp20 = add i32 undef, undef
  %tmp21 = sub i32 undef, undef
  %tmp22 = add i32 undef, undef
  %tmp23 = sub i32 undef, undef
  %tmp24 = add nsw i32 %tmp22, %tmp20
  %tmp25 = sub nsw i32 %tmp20, %tmp22
  %tmp26 = add nsw i32 %tmp23, %tmp21
  %tmp27 = sub nsw i32 %tmp21, %tmp23
  %tmp28 = add i32 undef, %tmp24
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 0, %tmp25
  %tmp33 = xor i32 %tmp32, 0
  %tmp34 = add i32 undef, %tmp27
  %tmp35 = xor i32 %tmp34, undef
  %tmp36 = add i32 %tmp31, %tmp19
  %tmp37 = add i32 %tmp36, %tmp29
  %tmp38 = add i32 %tmp37, %tmp33
  %tmp39 = add i32 %tmp38, %tmp35
  %tmp40 = add i32 undef, undef
  %tmp41 = sub i32 undef, undef
  %tmp42 = add i32 undef, undef
  %tmp43 = sub i32 undef, undef
  %tmp44 = add nsw i32 %tmp42, %tmp40
  %tmp45 = sub nsw i32 %tmp40, %tmp42
  %tmp46 = add nsw i32 %tmp43, %tmp41
  %tmp47 = sub nsw i32 %tmp41, %tmp43
  %tmp48 = add i32 undef, %tmp44
  %tmp49 = xor i32 %tmp48, undef
  %tmp50 = add i32 undef, %tmp46
  %tmp51 = xor i32 %tmp50, undef
  %tmp52 = add i32 undef, %tmp45
  %tmp53 = xor i32 %tmp52, undef
  %tmp54 = add i32 undef, %tmp47
  %tmp55 = xor i32 %tmp54, undef
  %tmp56 = add i32 %tmp51, %tmp39
  %tmp57 = add i32 %tmp56, %tmp49
  %tmp58 = add i32 %tmp57, %tmp53
  %tmp59 = add i32 %tmp58, %tmp55
  %tmp60 = load i32, i32* undef, align 4
  %tmp61 = add i32 undef, %tmp60
  %tmp62 = sub i32 %tmp60, undef
  %tmp63 = add i32 undef, undef
  %tmp64 = sub i32 undef, undef
  %tmp65 = add nsw i32 %tmp63, %tmp61
  %tmp66 = sub nsw i32 %tmp61, %tmp63
  %tmp67 = add nsw i32 %tmp64, %tmp62
  %tmp68 = sub nsw i32 %tmp62, %tmp64
  %tmp69 = add i32 undef, %tmp65
  %tmp70 = xor i32 %tmp69, undef
  %tmp71 = add i32 undef, %tmp67
  %tmp72 = xor i32 %tmp71, undef
  %tmp73 = add i32 undef, %tmp66
  %tmp74 = xor i32 %tmp73, undef
  %tmp75 = add i32 undef, %tmp68
  %tmp76 = xor i32 %tmp75, undef
  %tmp77 = add i32 %tmp72, %tmp59
  %tmp78 = add i32 %tmp77, %tmp70
  %tmp79 = add i32 %tmp78, %tmp74
  %tmp80 = add i32 %tmp79, %tmp76
  %tmp81 = add i32 undef, undef
  %tmp82 = sub i32 undef, undef
  %tmp83 = add i32 undef, undef
  %tmp84 = sub i32 undef, undef
  %tmp85 = add nsw i32 %tmp83, %tmp81
  %tmp86 = sub nsw i32 %tmp81, %tmp83
  %tmp87 = add nsw i32 %tmp84, %tmp82
  %tmp88 = sub nsw i32 %tmp82, %tmp84
  %tmp89 = add i32 undef, %tmp85
  %tmp90 = xor i32 %tmp89, undef
  %tmp91 = add i32 undef, %tmp87
  %tmp92 = xor i32 %tmp91, undef
  %tmp93 = add i32 undef, %tmp86
  %tmp94 = xor i32 %tmp93, undef
  %tmp95 = add i32 undef, %tmp88
  %tmp96 = xor i32 %tmp95, undef
  %tmp97 = add i32 %tmp92, %tmp80
  %tmp98 = add i32 %tmp97, %tmp90
  %tmp99 = add i32 %tmp98, %tmp94
  %tmp100 = add i32 %tmp99, %tmp96
  %tmp101 = add i32 undef, undef
  %tmp102 = sub i32 undef, undef
  %tmp103 = add i32 undef, undef
  %tmp104 = sub i32 undef, undef
  %tmp105 = add nsw i32 %tmp103, %tmp101
  %tmp106 = sub nsw i32 %tmp101, %tmp103
  %tmp107 = add nsw i32 %tmp104, %tmp102
  %tmp108 = sub nsw i32 %tmp102, %tmp104
  %tmp109 = add i32 undef, %tmp105
  %tmp110 = xor i32 %tmp109, undef
  %tmp111 = add i32 undef, %tmp107
  %tmp112 = xor i32 %tmp111, undef
  %tmp113 = add i32 undef, %tmp106
  %tmp114 = xor i32 %tmp113, undef
  %tmp115 = add i32 undef, %tmp108
  %tmp116 = xor i32 %tmp115, undef
  %tmp117 = add i32 %tmp112, %tmp100
  %tmp118 = add i32 %tmp117, %tmp110
  %tmp119 = add i32 %tmp118, %tmp114
  %tmp120 = add i32 %tmp119, %tmp116
  %tmp121 = add i32 undef, undef
  %tmp122 = sub i32 undef, undef
  %tmp123 = add i32 undef, undef
  %tmp124 = sub i32 undef, undef
  %tmp125 = add nsw i32 %tmp123, %tmp121
  %tmp126 = sub nsw i32 %tmp121, %tmp123
  %tmp127 = add nsw i32 %tmp124, %tmp122
  %tmp128 = sub nsw i32 %tmp122, %tmp124
  %tmp129 = add i32 undef, %tmp125
  %tmp130 = xor i32 %tmp129, undef
  %tmp131 = add i32 undef, %tmp127
  %tmp132 = xor i32 %tmp131, undef
  %tmp133 = add i32 undef, %tmp126
  %tmp134 = xor i32 %tmp133, undef
  %tmp135 = add i32 undef, %tmp128
  %tmp136 = xor i32 %tmp135, undef
  %tmp137 = add i32 %tmp132, %tmp120
  %tmp138 = add i32 %tmp137, %tmp130
  %tmp139 = add i32 %tmp138, %tmp134
  %tmp140 = add i32 %tmp139, %tmp136
  %tmp141 = add i32 undef, undef
  %tmp142 = sub i32 undef, undef
  %tmp143 = add i32 undef, undef
  %tmp144 = sub i32 undef, undef
  %tmp145 = add nsw i32 %tmp143, %tmp141
  %tmp146 = sub nsw i32 %tmp141, %tmp143
  %tmp147 = add nsw i32 %tmp144, %tmp142
  %tmp148 = sub nsw i32 %tmp142, %tmp144
  %tmp149 = add i32 undef, %tmp145
  %tmp150 = xor i32 %tmp149, undef
  %tmp151 = add i32 undef, %tmp147
  %tmp152 = xor i32 %tmp151, undef
  %tmp153 = add i32 undef, %tmp146
  %tmp154 = xor i32 %tmp153, undef
  %tmp155 = add i32 undef, %tmp148
  %tmp156 = xor i32 %tmp155, undef
  %tmp157 = add i32 %tmp152, %tmp140
  %tmp158 = add i32 %tmp157, %tmp150
  %tmp159 = add i32 %tmp158, %tmp154
  %tmp160 = add i32 %tmp159, %tmp156
  %tmp161 = and i32 %tmp160, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

Ho Florian, tried to reproduce, was unable to do it:

opt -slp-vectorizer -S ./repro1.ll
; ModuleID = './repro1.ll'
source_filename = "./repro1.ll"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp60 = load i32, i32* undef, align 4
  %0 = insertelement <32 x i32> poison, i32 %tmp60, i32 0
  %shuffle = shufflevector <32 x i32> %0, <32 x i32> poison, <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %1 = add <32 x i32> %shuffle, poison
  %2 = sub <32 x i32> %shuffle, poison
  %3 = shufflevector <32 x i32> %1, <32 x i32> %2, <32 x i32> <i32 0, i32 33, i32 2, i32 35, i32 36, i32 5, i32 6, i32 39, i32 40, i32 9, i32 10, i32 43, i32 44, i32 13, i32 14, i32 47, i32 48, i32 17, i32 18, i32 51, i32 52, i32 21, i32 22, i32 55, i32 56, i32 25, i32 26, i32 59, i32 60, i32 29, i32 30, i32 63>
  %4 = shufflevector <32 x i32> %3, <32 x i32> poison, <32 x i32> <i32 2, i32 3, i32 0, i32 1, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12, i32 19, i32 18, i32 17, i32 16, i32 23, i32 22, i32 21, i32 20, i32 27, i32 26, i32 25, i32 24, i32 31, i32 30, i32 29, i32 28>
  %5 = add nsw <32 x i32> %3, %4
  %6 = sub nsw <32 x i32> %3, %4
  %7 = shufflevector <32 x i32> %5, <32 x i32> %6, <32 x i32> <i32 0, i32 1, i32 34, i32 35, i32 4, i32 5, i32 38, i32 39, i32 8, i32 9, i32 42, i32 43, i32 12, i32 13, i32 46, i32 47, i32 16, i32 17, i32 50, i32 51, i32 20, i32 21, i32 54, i32 55, i32 24, i32 25, i32 58, i32 59, i32 28, i32 29, i32 62, i32 63>
  %8 = add <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>, %7
  %9 = xor <32 x i32> %8, <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %10 = call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %9)
  %tmp161 = and i32 %10, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }
attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

Could you check one more time, please?

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Checked on the debug build, will check with rel+assert

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Still unable to reproduce but I'll try to investigate it.

In D107966#3533961, @ABataev wrote:

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Still unable to reproduce but I'll try to investigate it.

I'm building on macOS which defaults to using libc++. It's possible that this may be the reason why you are not seeing the crash. I left an inline comment for a sort call. Replacing this with stable_sort fixes the crash.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7323	Is it possible that the relative order of elements that compare as equal matters in the code below? With stable_sort, I am not seeing the crash.

ABataev added inline comments.May 24 2022, 5:36 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7323	Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there should not be difference between sort and stable sort results.

ABataev added inline comments.May 24 2022, 6:09 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7323	Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58?

fhahn added inline comments.May 27 2022, 5:03 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7323	It doesn't crash any longer, thanks!

ABataev added inline comments.May 27 2022, 5:27 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7323	Great!

Our SPEC build on PowerPC failed due to this patch. Following PR (gd.ll) is extracted from gcc_r build:

target datalayout = "E-m:a-p:32:32-i64:64-n32"
target triple = "powerpc-ibm-aix7.2.0.0"

%union.tree_node = type { %struct.tree_optimization_option }
%struct.tree_optimization_option = type { %struct.tree_common, %struct.cl_optimization }
%struct.tree_common = type { %struct.tree_base, %union.tree_node*, %union.tree_node* }
%struct.tree_base = type { i64 }
%struct.cl_optimization = type { i32 }
%struct.c_declarator = type { i32, %struct.c_declarator*, i32, %union.anon.1 }
%union.anon.1 = type { %struct.anon.443 }
%struct.anon.443 = type { %union.tree_node*, i32, %union.tree_node*, i8 }
%struct.c_declspecs = type { %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, i32, i32, i8, i32, i16, i8 }

@flag_isoc99 = internal unnamed_addr global i1 false, align 4
@pedantic = internal global i32 0, align 4

; Function Attrs: nounwind
define fastcc %union.tree_node* @grokdeclarator(%struct.c_declarator* noundef readonly %declarator, %struct.c_declspecs* nocapture noundef %declspecs) unnamed_addr #0 {
entry:
  %type = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 0
  %thread_p = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 8
  %p0 = bitcast %struct.c_declarator* %declarator to i64*
  %t0 = load i64, i64* %p0, align 8
  %cmp00 = icmp eq i64 %t0, 0
  br i1 %cmp00, label %if.end10, label %cleanup

if.end10:                                         ; preds = %entry
  %t1 = load %union.tree_node*, %union.tree_node** %type, align 4
  %t2 = getelementptr %union.tree_node, %union.tree_node* %t1, i32 0, i32 0, i32 0, i32 0, i32 0
  %bf.load1 = load i64, i64* %t2, align 8
  %bf.lshr.mask5.i = and i64 %bf.load1, -281474976710656
  %cmp10 = icmp eq i64 %bf.lshr.mask5.i, 4222124650659840
  %extract.t814 = trunc i64 %bf.load1 to i8
  %extract.t817 = trunc i64 %bf.load1 to i32
  %extract819 = lshr i64 %bf.load1, 43
  %extract.t820 = trunc i64 %extract819 to i32
  %extract823 = lshr i64 %bf.load1, 44
  %extract.t824 = trunc i64 %extract823 to i32
  br i1 %cmp10, label %if.then20, label %if.else20

if.then20:                                        ; preds = %if.end10
  %type1.i33 = getelementptr inbounds %union.tree_node, %union.tree_node* %t1, i32 0, i32 0, i32 0, i32 2
  %t3 = load %union.tree_node*, %union.tree_node** %type1.i33, align 4
  %t4 = getelementptr %union.tree_node, %union.tree_node* %t3, i32 0, i32 0, i32 0, i32 0, i32 0
  %bf.load2 = load i64, i64* %t4, align 8
  %extract.t = trunc i64 %bf.load2 to i8
  %extract.t816 = trunc i64 %bf.load2 to i32
  %extract = lshr i64 %bf.load2, 43
  %extract.t818 = trunc i64 %extract to i32
  %extract821 = lshr i64 %bf.load2, 44
  %extract.t822 = trunc i64 %extract821 to i32
  br label %if.else20

if.else20:                                        ; preds = %if.then20, %if.end10
  %bf.load.off0 = phi i8 [ %extract.t, %if.then20 ], [ %extract.t814, %if.end10 ]
  %bf.load.off0815 = phi i32 [ %extract.t816, %if.then20 ], [ %extract.t817, %if.end10 ]
  %bf.load.off43 = phi i32 [ %extract.t818, %if.then20 ], [ %extract.t820, %if.end10 ]
  %bf.load.off44 = phi i32 [ %extract.t822, %if.then20 ], [ %extract.t824, %if.end10 ]
  %type.addr.0.lcssa.i = phi %union.tree_node* [ %t3, %if.then20 ], [ %t1, %if.end10 ]
  %p5 = getelementptr inbounds %union.tree_node, %union.tree_node* %type.addr.0.lcssa.i, i32 0, i32 0, i32 1, i32 0
  %p9 = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 9
  %bf.load154 = load i16, i16* %thread_p, align 4
  %bf.lshr155 = lshr i16 %bf.load154, 7
  %bf.clear156 = and i16 %bf.lshr155, 1
  %bf.cast157 = zext i16 %bf.clear156 to i32
  %bf.cast162 = and i32 %bf.load.off43, 1
  %add = add nuw nsw i32 %bf.cast162, %bf.cast157
  %bf.load168 = load i32, i32* %p5, align 4
  %bf.lshr169 = lshr i32 %bf.load168, 18
  %t6 = insertelement <2 x i16> poison, i16 %bf.load154, i64 0
  %t7 = shufflevector <2 x i16> %t6, <2 x i16> poison, <2 x i32> zeroinitializer
  %t8 = lshr <2 x i16> %t7, <i16 5, i16 6>
  %t9 = and <2 x i16> %t8, <i16 1, i16 1>
  %t10 = zext <2 x i16> %t9 to <2 x i32>
  %t11 = insertelement <2 x i32> poison, i32 %bf.lshr169, i64 0
  %t12 = insertelement <2 x i32> %t11, i32 %bf.load.off44, i64 1
  %t13 = and <2 x i32> %t12, <i32 1, i32 1>
  %t14 = add nuw nsw <2 x i32> %t13, %t10
  %t15 = load i8, i8* %p9, align 2
  %conv188 = zext i8 %t15 to i32
  %cmp20 = icmp eq i8 %t15, 0
  %conv192 = and i32 %bf.load.off0815, 255
  %cond196 = select i1 %cmp20, i32 %bf.load.off0815, i32 %conv188
  %t16 = load i32, i32* @pedantic, align 4
  %cmp30 = icmp eq i32 %t16, 0
  %.b28 = load i1, i1* @flag_isoc99, align 4
  %t17 = insertelement <2 x i1> poison, i1 %cmp20, i64 0
  %t18 = insertelement <2 x i1> %t17, i1 %cmp30, i64 1
  %t19 = zext <2 x i1> %t18 to <2 x i64>
  %or.cond1969 = select i1 %cmp30, i1 true, i1 %.b28
  br i1 %or.cond1969, label %cleanup, label %if.else30

if.else30:                                        ; preds = %if.else20
  %cmp40 = icmp ugt i32 %add, 1
  br i1 %cmp40, label %if.then40, label %if.end40

if.then40:                                        ; preds = %if.else30
  br label %if.end40

if.end40:                                         ; preds = %if.then40, %if.else30
  %t20 = extractelement <2 x i32> %t14, i64 0
  %cmp50 = icmp ugt i32 %t20, 1
  br i1 %cmp50, label %if.then50, label %if.end50

if.then50:                                        ; preds = %if.end40
  br label %if.end50

if.end50:                                         ; preds = %if.then50, %if.end40
  br label %cleanup

cleanup:                                          ; preds = %if.end50, %if.else20, %entry
  ret %union.tree_node* null
}

attributes #0 = { nounwind "approx-func-fp-math"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="pwr10" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+isa-v206-instructions,+isa-v207-instructions,+isa-v30-instructions,+isa-v31-instructions,+mma,+paired-vector-memops,+pcrelative-memops,+power10-vector,+power8-vector,+power9-vector,+prefix-instrs,+vsx,-htm,-privileged,-quadword-atomics,-rop-protect,-spe" }

Here is the dumping with the latest SLPVectorizer.cpp (up to June 16). To reproduce,

opt  -slp-vectorizer gd.ll

opt: llvm/main/llvm-project/llvm/lib/IR/Instructions.cpp:2012: llvm::ShuffleVectorInst::ShuffleVectorInst(llvm::Value *, llvm::Value *, ArrayRef<int>, const llvm::Twine &, llvm::Instruction *): Assertion `isValidOperands(V1, V2, Mask) && "Invalid shuffle vector instruction operands!"' failed.

Stack dump:
0. Program arguments: llvm/main/build/bin/opt -slp-vectorizer gd.ll
#0 0x0000000012ea16d4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (llvm/main/build/bin/opt+0x12ea16d4)
#1 0x0000000012ea1af4 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
#2 0x0000000012e9e818 llvm::sys::RunSignalHandlers() (llvm/main/build/bin/opt+0x12e9e818)
#3 0x0000000012ea1dbc SignalHandler(int) Signals.cpp:0:0
#4 0x00007d17768b04c8 (linux-vdso64.so.1+0x4c8)
#5 0x00007d1776130468 libc_signal_restore_set /build/glibc-tRXAGY/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
#6 0x00007d1776130468 raise /build/glibc-tRXAGY/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3
#7 0x00007d1776107cd0 abort /build/glibc-tRXAGY/glibc-2.31/stdlib/abort.c:79:7
#8 0x00007d177611f5dc assert_fail_base /build/glibc-tRXAGY/glibc-2.31/assert/assert.c:92:3
#9 0x00007d177611f680 __assert_fail /build/glibc-tRXAGY/glibc-2.31/assert/assert.c:101:3
#10 0x00000000124870cc llvm::ShuffleVectorInst::ShuffleVectorInst(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>, llvm::Twine const&, llvm::Instruction*) (llvm/main/build/bin/opt+0x124870cc)
#11 0x000000001064b62c llvm::IRBuilderBase::CreateShuffleVector(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>, llvm::Twine const&) (llvm/main/build/bin/opt+0x1064b62c)
#12 0x000000001318a698 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>>>>&)::$_69::operator()(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>) const SLPVectorizer.cpp:0:0
#13 0x000000001314098c llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>>>>&) (llvm/main/build/bin/opt+0x1314098c)
#14 0x0000000013150de0 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (llvm/main/build/bin/opt+0x13150de0)
......

Can you please take a look? Thanks!

reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!

%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2

opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

In D107966#3620735, @vdmitrie wrote:
reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!
%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2
opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

Hi Valery, thanks for the report, will prepare the fix later today or tomorrow

In D107966#3620735, @vdmitrie wrote:
reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!
%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2
opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

Investigated. This is not quite a bug, but some junk is left that requires cleanup. I'll add the code to do this extra cleanup to avoid any problems, plus, I believe it may improve compile time in some cases.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

455 lines

test/

Transforms/

SLPVectorizer/

AArch64/

transpose-inseltpoison.ll

2 lines

transpose.ll

2 lines

X86/

cmp_commute-inseltpoison.ll

10 lines

cmp_commute.ll

10 lines

crash_cmpop.ll

2 lines

crash_exceed_scheduling.ll

2 lines

crash_lencod.ll

9 lines

crash_scheduling-inseltpoison.ll

23 lines

crash_scheduling.ll

23 lines

crash_smallpt.ll

15 lines

extractelement.ll

24 lines

extracts-with-undefs.ll

28 lines

horizontal-minmax.ll

4 lines

insert-element-build-vector-inseltpoison.ll

92 lines

insert-element-build-vector.ll

92 lines

insert-shuffle.ll

14 lines

jumbled-load-multiuse.ll

7 lines

jumbled-load.ll

17 lines

jumbled_store_crash.ll

15 lines

load-merge-inseltpoison.ll

5 lines

load-merge.ll

5 lines

ordering-bug.ll

7 lines

phi.ll

60 lines

pr42022-inseltpoison.ll

14 lines

pr42022.ll

54 lines

remark_extract_broadcast.ll

10 lines

vec_list_bias-inseltpoison.ll

11 lines

vec_list_bias.ll

11 lines

vectorize-widest-phis.ll

16 lines

Diff 394242

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,558 Lines • ▼ Show 20 Lines	if (IE2) {
IE2 = nullptr;		IE2 = nullptr;
else		else
IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));		IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));
}		}
} while (IE1 \|\| IE2);		} while (IE1 \|\| IE2);
return false;		return false;
}		}

		/// Checks if the \p IE1 instructions is followed by \p IE2 instruction in the
		/// buildvector sequence.
		static bool isFirstInsertElement(const InsertElementInst *IE1,
		const InsertElementInst *IE2) {
		const auto *I1 = IE1;
		const auto *I2 = IE2;
		do {
		if (I2 == IE1)
		return true;
		if (I1 == IE2)
		return false;
		if (I1)
		I1 = dyn_cast<InsertElementInst>(I1->getOperand(0));
		if (I2)
		I2 = dyn_cast<InsertElementInst>(I2->getOperand(0));
		} while (I1 \|\| I2);
		llvm_unreachable("Two different buildvectors not expected.");
		}

		namespace {
		/// Returns incoming Value , if the requested type is Value too, or a default
		/// value, otherwise.
		struct ValueSelect {
		template <typename U>
		static typename std::enable_if<std::is_same<Value , U>::value, Value >::type
		get(Value *V) {
		return V;
		}
		template <typename U>
		static typename std::enable_if<!std::is_same<Value *, U>::value, U>::type
		get(Value *) {
		return U();
		}
		};
		} // namespace

		template <typename T>
		static T *performExtractsShuffleAction(
		MutableArrayRef<std::pair<T , SmallVector<int>>> ShuffleMask, Value Base,
		function_ref<unsigned(T *)> GetVF,
		function_ref<std::pair<T , bool>(T , ArrayRef<int>)> ResizeAction,
		function_ref<T (ArrayRef<int>, ArrayRef<T >)> Action) {
		assert(!ShuffleMask.empty() && "Empty list of shuffles for inserts.");
		SmallVector<int> Mask(ShuffleMask.begin()->second);
		auto VMIt = std::next(ShuffleMask.begin());
		T *Prev = nullptr;
		bool IsBaseNotUndef = !isUndefVector(Base);
		if (IsBaseNotUndef) {
		// Base is not undef, need to combine it with the next subvectors.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		for (unsigned Idx = 0, VF = Mask.size(); Idx < VF; ++Idx) {
		if (Mask[Idx] == UndefMaskElem)
		Mask[Idx] = Idx;
		else
		Mask[Idx] = (Res.second ? Idx : Mask[Idx]) + VF;
		}
		auto V = ValueSelect::get<T >(Base);
		assert((!V \|\| GetVF(V) == Mask.size()) &&
		"Expected base vector of VF number of elements.");
		Prev = Action(Mask, {V, Res.first});
		} else if (ShuffleMask.size() == 1) {
		// Base is undef and only 1 vector is shuffled.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		if (Res.second)
		Prev = Res.first;
		else
		Prev = Action(Mask, {ShuffleMask.begin()->first});
		} else {
		// Base is undef and at least 2 input vectors shuffled.
		unsigned Vec1VF = GetVF(ShuffleMask.begin()->first);
		unsigned Vec2VF = GetVF(VMIt->first);
		if (Vec1VF == Vec2VF) {
		// No need to resize the input vectors since they are of the same size, we
		// can shuffle them directly.
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = SecMask[I] + Vec1VF;
		}
		}
		Prev = Action(Mask, {ShuffleMask.begin()->first, VMIt->first});
		} else {
		// Vectors of different sizes - resize and reshuffle.
		std::pair<T *, bool> Res1 =
		ResizeAction(ShuffleMask.begin()->first, Mask);
		std::pair<T *, bool> Res2 = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (Mask[I] != UndefMaskElem) {
		assert(SecMask[I] == UndefMaskElem && "Multiple uses of scalars.");
		if (Res1.second)
		Mask[I] = I;
		} else if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = (Res2.second ? I : SecMask[I]) + VF;
		}
		}
		Prev = Action(Mask, {Res1.first, Res2.first});
		}
		VMIt = std::next(VMIt);
		}
		for (auto E = ShuffleMask.end(); VMIt != E; ++VMIt) {
		// Shuffle other input vectors, if any.
		std::pair<T *, bool> Res = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert((Mask[I] == UndefMaskElem \|\| IsBaseNotUndef) &&
		"Multiple uses of scalars.");
		Mask[I] = (Res.second ? I : SecMask[I]) + VF;
		} else if (Mask[I] != UndefMaskElem) {
		Mask[I] = I;
		}
		}
		Prev = Action(Mask, {Prev, Res.first});
		}
		return Prev;
		}

InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();		unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();

InstructionCost C = getEntryCost(&TE, VectorizedVals);		InstructionCost C = getEntryCost(&TE, VectorizedVals);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
InstructionCost ExtractCost = 0;		InstructionCost ExtractCost = 0;
SmallVector<unsigned> VF;		SmallVector<MapVector<const TreeEntry *, SmallVector<int>>> ShuffleMasks;
SmallVector<SmallVector<int>> ShuffleMask;		SmallVector<std::pair<Value , const TreeEntry >> FirstUsers;
SmallVector<Value *> FirstUsers;
SmallVector<APInt> DemandedElts;		SmallVector<APInt> DemandedElts;
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!isa_and_nonnull<InsertElementInst>(EU.User) &&		if (!isa_and_nonnull<InsertElementInst>(EU.User) &&
!ExtractCostCalculated.insert(EU.Scalar).second)		!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
Show All 13 Lines	for (ExternalUser &EU : ExternalUses) {

// If found user is an insertelement, do not calculate extract cost but try		// If found user is an insertelement, do not calculate extract cost but try
// to detect it as a final shuffled/identity match.		// to detect it as a final shuffled/identity match.
if (auto *VU = dyn_cast_or_null<InsertElementInst>(EU.User)) {		if (auto *VU = dyn_cast_or_null<InsertElementInst>(EU.User)) {
if (auto *FTy = dyn_cast<FixedVectorType>(VU->getType())) {		if (auto *FTy = dyn_cast<FixedVectorType>(VU->getType())) {
Optional<int> InsertIdx = getInsertIndex(VU, 0);		Optional<int> InsertIdx = getInsertIndex(VU, 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
auto It = find_if(FirstUsers, [VU](Value V) {		const TreeEntry *ScalarTE = getTreeEntry(EU.Scalar);
return areTwoInsertFromSameBuildVector(VU,		auto *It =
cast<InsertElementInst>(V));		find_if(FirstUsers,
		[VU](const std::pair<Value , const TreeEntry > &Pair) {
		return areTwoInsertFromSameBuildVector(
		VU, cast<InsertElementInst>(Pair.first));
});		});
		RKSimonUnsubmitted Not Done Reply Inline Actions The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to simplify this patch. RKSimon: The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to…
int VecId = -1;		int VecId = -1;
if (It == FirstUsers.end()) {		if (It == FirstUsers.end()) {
VF.push_back(FTy->getNumElements());		(void)ShuffleMasks.emplace_back();
ShuffleMask.emplace_back(VF.back(), UndefMaskElem);
// Find the insertvector, vectorized in tree, if any.		// Find the insertvector, vectorized in tree, if any.
Value *Base = VU;		Value *Base = VU;
while (isa<InsertElementInst>(Base)) {		while (auto *IEBase = dyn_cast<InsertElementInst>(Base)) {
// Build the mask for the vectorized insertelement instructions.		// Build the mask for the vectorized insertelement instructions.
if (const TreeEntry *E = getTreeEntry(Base)) {		if (const TreeEntry *E = getTreeEntry(IEBase)) {
VU = cast<InsertElementInst>(Base);		VU = IEBase;
do {		do {
int Idx = E->findLaneForValue(Base);		int Idx = E->findLaneForValue(IEBase);
ShuffleMask.back()[Idx] = Idx;		SmallVectorImpl<int> &Mask = ShuffleMasks.back()[ScalarTE];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		Mask[Idx] = Idx;
Base = cast<InsertElementInst>(Base)->getOperand(0);		Base = cast<InsertElementInst>(Base)->getOperand(0);
} while (E == getTreeEntry(Base));		} while (E == getTreeEntry(Base));
break;		break;
}		}
Base = cast<InsertElementInst>(Base)->getOperand(0);		Base = cast<InsertElementInst>(Base)->getOperand(0);
}		}
FirstUsers.push_back(VU);		FirstUsers.emplace_back(VU, ScalarTE);
DemandedElts.push_back(APInt::getZero(VF.back()));		DemandedElts.push_back(APInt::getZero(FTy->getNumElements()));
VecId = FirstUsers.size() - 1;		VecId = FirstUsers.size() - 1;
} else {		} else {
		if (isFirstInsertElement(VU, cast<InsertElementInst>(It->first)))
		It->first = VU;
VecId = std::distance(FirstUsers.begin(), It);		VecId = std::distance(FirstUsers.begin(), It);
}		}
int Idx = *InsertIdx;		int Idx = *InsertIdx;
ShuffleMask[VecId][Idx] = EU.Lane;		SmallVectorImpl<int> &Mask = ShuffleMasks[VecId][ScalarTE];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		assert(Mask[Idx] == UndefMaskElem &&
		"InsertElementInstruction used already.");
		Mask[Idx] = EU.Lane;
DemandedElts[VecId].setBit(Idx);		DemandedElts[VecId].setBit(Idx);
continue;		continue;
}		}
}		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
Show All 9 Lines	for (ExternalUser &EU : ExternalUses) {
} else {		} else {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

InstructionCost SpillCost = getSpillCost();		InstructionCost SpillCost = getSpillCost();
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
if (FirstUsers.size() == 1) {		auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask) {
int Limit = ShuffleMask.front().size() * 2;		InstructionCost C = 0;
if (all_of(ShuffleMask.front(), [Limit](int Idx) { return Idx < Limit; }) &&		unsigned VF = Mask.size();
!ShuffleVectorInst::isIdentityMask(ShuffleMask.front())) {		unsigned VecVF = TE->getVectorFactor();
InstructionCost C = TTI->getShuffleCost(		if (VF != VecVF &&
		(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) \|\|
		(all_of(Mask,
		[VF](int Idx) { return Idx < 2 * static_cast<int>(VF); }) &&
		!ShuffleVectorInst::isIdentityMask(Mask)))) {
		SmallVector<int> OrigMask(VecVF, UndefMaskElem);
		std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
		OrigMask.begin());
		C = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc,		TTI::SK_PermuteSingleSrc,
cast<FixedVectorType>(FirstUsers.front()->getType()),		FixedVectorType::get(TE->getMainOp()->getType(), VecVF), OrigMask);
ShuffleMask.front());		LLVM_DEBUG(
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of insertelement external users "		<< " for final shuffle of insertelement external users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
<< "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
		return std::make_pair(TE, true);
}		}
InstructionCost InsertCost = TTI->getScalarizationOverhead(		return std::make_pair(TE, false);
cast<FixedVectorType>(FirstUsers.front()->getType()),		};
DemandedElts.front(), /Insert/ true, /Extract/ false);		// Calculate the cost of the reshuffled vectors, if any.
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost		for (int I = 0, E = FirstUsers.size(); I < E; ++I) {
<< " for insertelements gather.\n"		Value *Base = cast<Instruction>(FirstUsers[I].first)->getOperand(0);
<< "SLP: Current total cost = " << Cost << "\n");		unsigned VF = ShuffleMasks[I].begin()->second.size();
Cost -= InsertCost;		auto *FTy = FixedVectorType::get(
} else if (FirstUsers.size() >= 2) {		cast<VectorType>(FirstUsers[I].first->getType())->getElementType(), VF);
unsigned MaxVF = *std::max_element(VF.begin(), VF.end());		auto Vector = ShuffleMasks[I].takeVector();
// Combined masks of the first 2 vectors.		(void)performExtractsShuffleAction<const TreeEntry>(
SmallVector<int> CombinedMask(MaxVF, UndefMaskElem);		makeMutableArrayRef(Vector.data(), Vector.size()), Base,
copy(ShuffleMask.front(), CombinedMask.begin());		[](const TreeEntry *E) { return E->getVectorFactor(); }, ResizeToVF,
APInt CombinedDemandedElts = DemandedElts.front().zextOrSelf(MaxVF);		[this, FTy, &Cost](ArrayRef<int> Mask,
auto *VecTy = FixedVectorType::get(		ArrayRef<const TreeEntry *> TEs) {
cast<VectorType>(FirstUsers.front()->getType())->getElementType(),		assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
MaxVF);		"Expected exactly 1 or 2 tree entries.");
for (int I = 0, E = ShuffleMask[1].size(); I < E; ++I) {		if (TEs.size() == 1) {
if (ShuffleMask[1][I] != UndefMaskElem) {		int Limit = 2 * Mask.size();
CombinedMask[I] = ShuffleMask[1][I] + MaxVF;		if (!all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) \|\|
CombinedDemandedElts.setBit(I);		!ShuffleVectorInst::isIdentityMask(Mask)) {
}
}
InstructionCost C =		InstructionCost C =
TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, CombinedMask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of vector node and external "		<< " for final shuffle of insertelement "
"insertelement users "		"external users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		TEs.front()->dump();
		dbgs()
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
InstructionCost InsertCost = TTI->getScalarizationOverhead(		}
VecTy, CombinedDemandedElts, /Insert/ true, /Extract/ false);		} else {
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost
<< " for insertelements gather.\n"
<< "SLP: Current total cost = " << Cost << "\n");
Cost -= InsertCost;
for (int I = 2, E = FirstUsers.size(); I < E; ++I) {
// Other elements - permutation of 2 vectors (the initial one and the
// next Ith incoming vector).
unsigned VF = ShuffleMask[I].size();
for (unsigned Idx = 0; Idx < VF; ++Idx) {
int Mask = ShuffleMask[I][Idx];
if (Mask != UndefMaskElem)
CombinedMask[Idx] = MaxVF + Mask;
else if (CombinedMask[Idx] != UndefMaskElem)
CombinedMask[Idx] = Idx;
}
for (unsigned Idx = VF; Idx < MaxVF; ++Idx)
if (CombinedMask[Idx] != UndefMaskElem)
CombinedMask[Idx] = Idx;
InstructionCost C =		InstructionCost C =
TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, CombinedMask);		TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs()
		<< "SLP: Adding cost " << C
<< " for final shuffle of vector node and external "		<< " for final shuffle of vector node and external "
"insertelement users "		"insertelement users.\n";
<< *VectorizableTree.front()->Scalars.front() << ".\n"		if (TEs.front()) TEs.front()->dump(); TEs.back()->dump();
<< "SLP: Current total cost = " << Cost << "\n");		dbgs() << "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
		}
		return TEs.back();
		});
InstructionCost InsertCost = TTI->getScalarizationOverhead(		InstructionCost InsertCost = TTI->getScalarizationOverhead(
cast<FixedVectorType>(FirstUsers[I]->getType()), DemandedElts[I],		cast<FixedVectorType>(FirstUsers[I].first->getType()), DemandedElts[I],
/Insert/ true, /Extract/ false);		/Insert/ true, /Extract/ false);
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost
<< " for insertelements gather.\n"
<< "SLP: Current total cost = " << Cost << "\n");
Cost -= InsertCost;		Cost -= InsertCost;
}		}
}

#ifndef NDEBUG		#ifndef NDEBUG
SmallString<256> Str;		SmallString<256> Str;
{		{
raw_svector_ostream OS(Str);		raw_svector_ostream OS(Str);
OS << "SLP: Spill Cost = " << SpillCost << ".\n"		OS << "SLP: Spill Cost = " << SpillCost << ".\n"
<< "SLP: Extract Cost = " << ExtractCost << ".\n"		<< "SLP: Extract Cost = " << ExtractCost << ".\n"
<< "SLP: Total Cost = " << Cost << ".\n";		<< "SLP: Total Cost = " << Cost << ".\n";
▲ Show 20 Lines • Show All 1,072 Lines • ▼ Show 20 Lines	Value BoUpSLP::vectorizeTree(TreeEntry E) {
return nullptr;		return nullptr;
}		}

Value *BoUpSLP::vectorizeTree() {		Value *BoUpSLP::vectorizeTree() {
ExtraValueToDebugLocsMap ExternallyUsedValues;		ExtraValueToDebugLocsMap ExternallyUsedValues;
return vectorizeTree(ExternallyUsedValues);		return vectorizeTree(ExternallyUsedValues);
}		}

		namespace {
		/// Data type for handling buildvector sequences with the reused scalars from
		/// other tree entries.
		struct ShuffledInsertData {
		/// List of insertelements to be replaced by shuffles.
		SmallVector<InsertElementInst *> InsertElements;
		/// The parent vectors and shuffle mask for the given list of inserts.
		MapVector<Value *, SmallVector<int>> ValueMasks;
		};
		} // namespace

Value *		Value *
BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {		BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// All blocks must be scheduled before any instructions are inserted.		// All blocks must be scheduled before any instructions are inserted.
for (auto &BSIter : BlocksSchedules) {		for (auto &BSIter : BlocksSchedules) {
scheduleBlock(BSIter.second.get());		scheduleBlock(BSIter.second.get());
}		}

Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
Show All 17 Lines	if (MinBWs.count(ScalarRoot)) {
auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);		auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);
auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);		auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);
VectorizableTree[0]->VectorizedValue = Trunc;		VectorizableTree[0]->VectorizedValue = Trunc;
}		}

LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()		LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()
<< " values .\n");		<< " values .\n");

		SmallVector<ShuffledInsertData> ShuffledInserts;
		// Maps vector instruction to original insertelement instruction
		DenseMap<Value , InsertElementInst > VectorToInsertElement;
// Extract all of the elements with the external uses.		// Extract all of the elements with the external uses.
for (const auto &ExternalUse : ExternalUses) {		for (const auto &ExternalUse : ExternalUses) {
Value *Scalar = ExternalUse.Scalar;		Value *Scalar = ExternalUse.Scalar;
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
Show All 23 Lines	auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
return Ex;		return Ex;
if (MinBWs[ScalarRoot].second)		if (MinBWs[ScalarRoot].second)
return Builder.CreateSExt(Ex, Scalar->getType());		return Builder.CreateSExt(Ex, Scalar->getType());
return Builder.CreateZExt(Ex, Scalar->getType());		return Builder.CreateZExt(Ex, Scalar->getType());
}		}
assert(isa<FixedVectorType>(Scalar->getType()) &&		assert(isa<FixedVectorType>(Scalar->getType()) &&
isa<InsertElementInst>(Scalar) &&		isa<InsertElementInst>(Scalar) &&
"In-tree scalar of vector type is not insertelement?");		"In-tree scalar of vector type is not insertelement?");
		auto *IE = cast<InsertElementInst>(Scalar);
		VectorToInsertElement.try_emplace(Vec, IE);
return Vec;		return Vec;
};		};
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
"Scalar with nullptr as an external user must be registered in "		"Scalar with nullptr as an external user must be registered in "
Show All 12 Lines	if (!User) {
"Externally used scalar is not found in ExternallyUsedValues");		"Externally used scalar is not found in ExternallyUsedValues");
NewInstLocs.append(It->second);		NewInstLocs.append(It->second);
ExternallyUsedValues.erase(Scalar);		ExternallyUsedValues.erase(Scalar);
// Required to update internally referenced instructions.		// Required to update internally referenced instructions.
Scalar->replaceAllUsesWith(NewInst);		Scalar->replaceAllUsesWith(NewInst);
continue;		continue;
}		}

		if (auto *IE = dyn_cast<InsertElementInst>(User)) {
		if (!Scalar->getType()->isVectorTy()) {
		if (auto *FTy = dyn_cast<FixedVectorType>(User->getType())) {
		Optional<int> InsertIdx = getInsertIndex(IE, 0);
		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
		continue;
		auto *It =
		find_if(ShuffledInserts, [IE](const ShuffledInsertData &Data) {
		// Checks if 2 insertelements are from the same buildvector.
		InsertElementInst *VecInsert = Data.InsertElements.front();
		return areTwoInsertFromSameBuildVector(IE, VecInsert);
		});
		int Idx = *InsertIdx;
		if (It == ShuffledInserts.end()) {
		(void)ShuffledInserts.emplace_back();
		It = std::next(ShuffledInserts.begin(), ShuffledInserts.size() - 1);
		// Find the insertvector, vectorized in tree, if any.
		Value *Base = IE;
		InsertElementInst *PrevBase = IE;
		while (auto *IEBase = dyn_cast<InsertElementInst>(Base)) {
		// Build the mask for the vectorized insertelement instructions.
		if (const TreeEntry *E = getTreeEntry(IEBase)) {
		assert(IEBase != PrevBase && "Unexpected in-tree user.");
		IE = PrevBase;
		do {
		int Idx = E->findLaneForValue(Base);
		SmallVectorImpl<int> &Mask = It->ValueMasks[Vec];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		Mask[Idx] = Idx;
		Base = cast<InsertElementInst>(Base)->getOperand(0);
		} while (E == getTreeEntry(Base));
		break;
		}
		PrevBase = cast<InsertElementInst>(Base);
		Base = PrevBase->getOperand(0);
		// After the vectorization the def-use chain has changed, need to
		RKSimonUnsubmitted Not Done Reply Inline Actions I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to match IEBase? we break after the if() above so we can't get here from there. RKSimon: I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We just iterate through insertelements, which are not part of the vectorized buildvector. For example: %0 = insertelement %..., %a, 0 %1 = insertelement %0, %b, 1 %2 = insertelement %1, %c, 2 If %c is vectorized, we start looking through a buildvectror, trying to find the vectorized base. Start from %2. getTreeEntry(%2) returns nullptr. Go to %1. getTreeEntry(%1) returns nullptr too (it is not a part of vectorized buildvector). Go to %0. getTreeEntry(%0) is vectorized and returns E. Iterate through all vectorized insertelements, build a mask. Put %2 to the list of insertelements, which must be transformed to shuffles. Later, we do the analysis of all inserts between %1-%2 (including boundaries), If they must be replaced with shuffles - replace them with shuffles, other insertelements remain as is, just change their base properly to the shuffles. ABataev: We just iterate through insertelements, which are not part of the vectorized buildvector. For…
		// look through original insertelement instructions, if they get
		// replaced by vector instructions.
		auto It = VectorToInsertElement.find(Base);
		if (It != VectorToInsertElement.end())
		Base = It->second;
		}
		}
		SmallVectorImpl<int> &Mask = It->ValueMasks[Vec];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		assert(Mask[Idx] == UndefMaskElem &&
		"InsertElementInstruction used already.");
		Mask[Idx] = ExternalUse.Lane;
		It->InsertElements.push_back(IE);
		continue;
		}
		}
		}

// Generate extracts for out-of-tree users.		// Generate extracts for out-of-tree users.
// Find the insertion point for the extractelement lane.		// Find the insertion point for the extractelement lane.
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
if (PHINode *PH = dyn_cast<PHINode>(User)) {		if (PHINode *PH = dyn_cast<PHINode>(User)) {
for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {		for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {
if (PH->getIncomingValue(i) == Scalar) {		if (PH->getIncomingValue(i) == Scalar) {
Instruction *IncomingTerminator =		Instruction *IncomingTerminator =
PH->getIncomingBlock(i)->getTerminator();		PH->getIncomingBlock(i)->getTerminator();
Show All 19 Lines	if (auto *VecI = dyn_cast<Instruction>(Vec)) {
Value *NewInst = ExtractAndExtendIfNeeded(Vec);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
CSEBlocks.insert(&F->getEntryBlock());		CSEBlocks.insert(&F->getEntryBlock());
User->replaceUsesOfWith(Scalar, NewInst);		User->replaceUsesOfWith(Scalar, NewInst);
}		}

LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");
}		}

		auto &&ResizeToVF = [this](Value *Vec, ArrayRef<int> Mask) {
		unsigned VF = Mask.size();
		unsigned VecVF = cast<FixedVectorType>(Vec->getType())->getNumElements();
		if (VF != VecVF) {
		if (any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); })) {
		Vec = Builder.CreateShuffleVector(Vec, Mask);
		return std::make_pair(Vec, true);
		}
		SmallVector<int> ResizeMask(VF, UndefMaskElem);
		for (unsigned I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		ResizeMask[Mask[I]] = Mask[I];
		}
		Vec = Builder.CreateShuffleVector(Vec, ResizeMask);
		}

		return std::make_pair(Vec, false);
		};
		// Perform shuffling of the vectorize tree entries for better handling of
		// external extracts.
		for (int I = 0, E = ShuffledInserts.size(); I < E; ++I) {
		// Find the first and the last instruction in the list of insertelements.
		sort(ShuffledInserts[I].InsertElements, isFirstInsertElement);
		InsertElementInst *FirstInsert = ShuffledInserts[I].InsertElements.front();
		InsertElementInst *LastInsert = ShuffledInserts[I].InsertElements.back();
		Builder.SetInsertPoint(LastInsert);
		auto Vector = ShuffledInserts[I].ValueMasks.takeVector();
		Value *NewInst = performExtractsShuffleAction<Value>(
		makeMutableArrayRef(Vector.data(), Vector.size()),
		FirstInsert->getOperand(0),
		[](Value *Vec) {
		return cast<VectorType>(Vec->getType())
		->getElementCount()
		.getKnownMinValue();
		},
		ResizeToVF,
		[this](ArrayRef<int> Mask, ArrayRef<Value *> Vals) {
		assert((Vals.size() == 1 \|\| Vals.size() == 2) &&
		"Expected exactly 1 or 2 input values.");
		if (Vals.size() == 1) {
		// Do not create shuffle if the mask is a simple identity
		// non-resizing mask.
		if (Mask.size() != cast<FixedVectorType>(Vals.front()->getType())
		->getNumElements() \|\|
		!ShuffleVectorInst::isIdentityMask(Mask))
		return Builder.CreateShuffleVector(Vals.front(), Mask);
		return Vals.front();
		}
		return Builder.CreateShuffleVector(Vals.front(), Vals.back(), Mask);
		});
		auto It = ShuffledInserts[I].InsertElements.rbegin();
		// Rebuild buildvector chain.
		InsertElementInst *II = nullptr;
		if (It != ShuffledInserts[I].InsertElements.rend())
		II = *It;
		SmallVector<Instruction *> Inserts;
		while (It != ShuffledInserts[I].InsertElements.rend()) {
		assert(II && "Must be an insertelement instruction.");
		if (*It == II)
		++It;
		else
		Inserts.push_back(cast<Instruction>(II));
		II = dyn_cast<InsertElementInst>(II->getOperand(0));
		}
		for (Instruction *II : reverse(Inserts)) {
		II->replaceUsesOfWith(II->getOperand(0), NewInst);
		if (auto *I = dyn_cast<Instruction>(NewInst))
		II->moveAfter(I);
		NewInst = II;
		}
		for (InsertElementInst *IE : reverse(ShuffledInserts[I].InsertElements)) {
		IE->replaceUsesOfWith(IE->getOperand(1),
		PoisonValue::get(IE->getOperand(1)->getType()));
		eraseInstruction(IE);
		}
		LastInsert->replaceAllUsesWith(NewInst);
		CSEBlocks.insert(LastInsert->getParent());
		}

// For each vectorized value:		// For each vectorized value:
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	for (Instruction *I : GatherShuffleSeq) {

// We can hoist this instruction. Move it to the pre-header.		// We can hoist this instruction. Move it to the pre-header.
I->moveBefore(PreHeader->getTerminator());		I->moveBefore(PreHeader->getTerminator());
}		}

// Make a list of all reachable blocks in our CSE queue.		// Make a list of all reachable blocks in our CSE queue.
SmallVector<const DomTreeNode *, 8> CSEWorkList;		SmallVector<const DomTreeNode *, 8> CSEWorkList;
CSEWorkList.reserve(CSEBlocks.size());		CSEWorkList.reserve(CSEBlocks.size());
for (BasicBlock *BB : CSEBlocks)		for (BasicBlock *BB : CSEBlocks)
		fhahnUnsubmitted Not Done Reply Inline Actions Is it possible that the relative order of elements that compare as equal matters in the code below? With stable_sort, I am not seeing the crash. fhahn: Is it possible that the relative order of elements that compare as equal matters in the code…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there should not be difference between sort and stable sort results. ABataev: Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58? ABataev: Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58?
		fhahnUnsubmitted Not Done Reply Inline Actions It doesn't crash any longer, thanks! fhahn: It doesn't crash any longer, thanks!
		ABataevAuthorUnsubmitted Done Reply Inline Actions Great! ABataev: Great!
if (DomTreeNode *N = DT->getNode(BB)) {		if (DomTreeNode *N = DT->getNode(BB)) {
assert(DT->isReachableFromEntry(N));		assert(DT->isReachableFromEntry(N));
CSEWorkList.push_back(N);		CSEWorkList.push_back(N);
}		}

// Sort blocks by domination. This ensures we visit a block after all blocks		// Sort blocks by domination. This ensures we visit a block after all blocks
// dominating it are visited.		// dominating it are visited.
llvm::sort(CSEWorkList, [](const DomTreeNode A, const DomTreeNode B) {		llvm::sort(CSEWorkList, [](const DomTreeNode A, const DomTreeNode B) {
▲ Show 20 Lines • Show All 3,125 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

	Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; SSE-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; SSE-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; SSE-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; SSE-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; SSE-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 0>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 0>
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B3]], i64 0			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B3]], i64 0
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B0]], i64 1			; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B0]], i64 1
	; SSE-NEXT: [[TMP8:%.*]] = fcmp ord <2 x float> [[TMP5]], [[TMP7]]			; SSE-NEXT: [[TMP8:%.*]] = fcmp ord <2 x float> [[TMP5]], [[TMP7]]
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[D0:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP10]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; SSE-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> [[TMP10]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[D3:%.*]] = shufflevector <4 x i1> [[D21]], <4 x i1> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>			; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; SSE-NEXT: [[TMP12:%.*]] = shufflevector <4 x i1> [[D21]], <4 x i1> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>
				; SSE-NEXT: [[R:%.*]] = sext <4 x i1> [[TMP12]] to <4 x i32>
	; SSE-NEXT: ret <4 x i32> [[R]]			; SSE-NEXT: ret <4 x i32> [[R]]
	;			;
	; AVX-LABEL: @fcmp_ord_uno_v4i32(			; AVX-LABEL: @fcmp_ord_uno_v4i32(
	; AVX-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i64 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; AVX-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; AVX-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

	Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; SSE-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; SSE-NEXT: [[B3:%.]] = load float, float [[P3]], align 4			; SSE-NEXT: [[B3:%.]] = load float, float [[P3]], align 4
	; SSE-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP3:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> undef, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]			; SSE-NEXT: [[TMP4:%.*]] = fcmp uno <2 x float> [[TMP2]], [[TMP3]]
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 0>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 3, i32 0>
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B3]], i64 0			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B3]], i64 0
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B0]], i64 1			; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B0]], i64 1
	; SSE-NEXT: [[TMP8:%.*]] = fcmp ord <2 x float> [[TMP5]], [[TMP7]]			; SSE-NEXT: [[TMP8:%.*]] = fcmp ord <2 x float> [[TMP5]], [[TMP7]]
	; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[D0:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP10:%.*]] = shufflevector <2 x i1> [[TMP4]], <2 x i1> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[D0]], <4 x i1> [[TMP10]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; SSE-NEXT: [[D21:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> [[TMP10]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>
	; SSE-NEXT: [[D3:%.*]] = shufflevector <4 x i1> [[D21]], <4 x i1> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>			; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i1> [[TMP8]], <2 x i1> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	; SSE-NEXT: [[R:%.*]] = sext <4 x i1> [[D3]] to <4 x i32>			; SSE-NEXT: [[TMP12:%.*]] = shufflevector <4 x i1> [[D21]], <4 x i1> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>
				; SSE-NEXT: [[R:%.*]] = sext <4 x i1> [[TMP12]] to <4 x i32>
	; SSE-NEXT: ret <4 x i32> [[R]]			; SSE-NEXT: ret <4 x i32> [[R]]
	;			;
	; AVX-LABEL: @fcmp_ord_uno_v4i32(			; AVX-LABEL: @fcmp_ord_uno_v4i32(
	; AVX-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <4 x float> [[A:%.]], i64 0
	; AVX-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; AVX-NEXT: [[B0:%.]] = load float, float [[B]], align 4			; AVX-NEXT: [[B0:%.]] = load float, float [[B]], align 4
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer			; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer
	; AVX-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]			; AVX-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]
	; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
	; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
	; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]			; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0			; AVX-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> <i32 1, i32 undef>
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1			; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1
	; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]			; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX: for.end:			; AVX: for.end:
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 29 Lines
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP13]], <2 x double> [[TMP6]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	}			}

	define fastcc void @dct36(double* %inbuf) {			define fastcc void @dct36(double* %inbuf) {
	; CHECK-LABEL: @dct36(			; CHECK-LABEL: @dct36(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2			%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
	%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1			%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
	%0 = load double, double* %arrayidx44, align 8			%0 = load double, double* %arrayidx44, align 8
	%add46 = fadd double %0, undef			%add46 = fadd double %0, undef
	store double %add46, double* %arrayidx41, align 8			store double %add46, double* %arrayidx41, align 8
	%1 = load double, double* %inbuf, align 8			%1 = load double, double* %inbuf, align 8
	%add49 = fadd double %1, %0			%add49 = fadd double %1, %0
	store double %add49, double* %arrayidx44, align 8			store double %add49, double* %arrayidx44, align 8
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	define void @_foo(double %p1, double %p2, double %p3) #0 {			define void @_foo(double %p1, double %p2, double %p3) #0 {
	; CHECK-LABEL: @_foo(			; CHECK-LABEL: @_foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[MUL19:%.]] = fmul double [[P1:%.]], 1.638400e+04
	; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04			; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[MUL21:%.]] = fmul double [[P2:%.]], 1.638400e+04			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[T_0259:%.]] = phi double [ 0.000000e+00, [[BB1]] ], [ [[ADD27:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P3_ADDR_0258:%.]] = phi double [ [[ADD]], [[BB1]] ], [ [[ADD28:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP4]])
	; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> poison, double [[T_0259]], i32 0
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, [[TBAA0:!tbaa !.*]]			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> poison, double [[P3_ADDR_0258]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP5]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, [[TBAA0]]			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[ADD27]] = fadd double [[MUL19]], [[T_0259]]			; CHECK-NEXT: [[TMP6]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[ADD28]] = fadd double [[MUL21]], [[P3_ADDR_0258]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	define void @_foo(double %p1, double %p2, double %p3) #0 {			define void @_foo(double %p1, double %p2, double %p3) #0 {
	; CHECK-LABEL: @_foo(			; CHECK-LABEL: @_foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[MUL19:%.]] = fmul double [[P1:%.]], 1.638400e+04
	; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04			; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[MUL21:%.]] = fmul double [[P2:%.]], 1.638400e+04			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[T_0259:%.]] = phi double [ 0.000000e+00, [[BB1]] ], [ [[ADD27:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P3_ADDR_0258:%.]] = phi double [ [[ADD]], [[BB1]] ], [ [[ADD28:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP4]])
	; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> undef, double [[T_0259]], i32 0
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa !0			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> undef, double [[P3_ADDR_0258]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP5]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa !0			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[ADD27]] = fadd double [[MUL19]], [[T_0259]]			; CHECK-NEXT: [[TMP6]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[ADD28]] = fadd double [[MUL21]], [[P3_ADDR_0258]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }			%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1			; CHECK-NEXT: [[TMP0:%.*]] = fmul <2 x double> undef, undef
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x double> undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> undef, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.then78, label %if.then38			br i1 undef, label %if.then78, label %if.then38
	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], [[X]]
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], [[X]]
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extracts-with-undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[BODY:%.*]]			; CHECK-NEXT: br label [[BODY:%.*]]
	; CHECK: body:			; CHECK: body:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x double> [ zeroinitializer, [[ENTRY:%.]] ], [ zeroinitializer, [[BODY]] ]			; CHECK-NEXT: [[PHI1:%.]] = phi double [ 0.000000e+00, [[ENTRY:%.]] ], [ 0.000000e+00, [[BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x double> [[TMP0]], i32 1			; CHECK-NEXT: [[PHI2:%.*]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ 0.000000e+00, [[BODY]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[MUL_I478_I:%.*]] = fmul fast double [[PHI1]], 0.000000e+00
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[MUL7_I485_I:%.*]] = fmul fast double undef, 0.000000e+00
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 0			; CHECK-NEXT: [[ADD8_I_I:%.*]] = fadd fast double [[MUL_I478_I]], [[MUL7_I485_I]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
	; CHECK-NEXT: [[ADD8_I_I:%.*]] = fadd fast double [[TMP5]], [[TMP4]]
	; CHECK-NEXT: [[CMP42_I:%.*]] = fcmp fast ole double [[ADD8_I_I]], 0.000000e+00			; CHECK-NEXT: [[CMP42_I:%.*]] = fcmp fast ole double [[ADD8_I_I]], 0.000000e+00
	; CHECK-NEXT: br i1 false, label [[BODY]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 false, label [[BODY]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: br i1 false, label [[IF_THEN135_I:%.]], label [[IF_END209_I:%.]]			; CHECK-NEXT: br i1 false, label [[IF_THEN135_I:%.]], label [[IF_END209_I:%.]]
	; CHECK: if.then135.i:			; CHECK: if.then135.i:
	; CHECK-NEXT: [[TMP6:%.*]] = fcmp fast olt <2 x double> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[CMP145_I:%.*]] = fcmp fast olt double [[PHI1]], 0.000000e+00
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP6]], i32 0			; CHECK-NEXT: [[CMP152_I:%.*]] = fcmp fast olt double [[PHI2]], 0.000000e+00
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i1> <i1 poison, i1 false>, i1 [[TMP7]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i1> <i1 poison, i1 false>, i1 [[CMP152_I]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP8]], <2 x double> zeroinitializer, <2 x double> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> [[TMP0]], <2 x double> zeroinitializer, <2 x double> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <2 x double> zeroinitializer, [[TMP9]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x double> zeroinitializer, [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul fast <2 x double> [[TMP10]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP11]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], zeroinitializer
	; CHECK-NEXT: br label [[IF_END209_I]]			; CHECK-NEXT: br label [[IF_END209_I]]
	; CHECK: if.end209.i:			; CHECK: if.end209.i:
	; CHECK-NEXT: [[TMP13:%.*]] = phi <2 x double> [ [[TMP12]], [[IF_THEN135_I]] ], [ zeroinitializer, [[EXIT]] ]			; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x double> [ [[TMP4]], [[IF_THEN135_I]] ], [ zeroinitializer, [[EXIT]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %body			br label %body

	body:			body:
	%phi1 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]			%phi1 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]
	%phi2 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]			%phi2 = phi double [ 0.000000e+00, %entry ], [ 0.000000e+00, %body ]
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 1,096 Lines • ▼ Show 20 Lines
	; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
	; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
	; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0			; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1			; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
	; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0			; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0
	; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP16:%.*]] = shufflevector <2 x i32> [[TMP15]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 2>
	; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0			; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
	; THRESH-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[TMP18:%.*]] = shufflevector <2 x i32> [[TMP17]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]			; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]
	; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0			; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
	; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1			; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]			; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP20]], i32 [[TMP21]]			; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP20]], i32 [[TMP21]]
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 435 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @reschedule_extract(			; NOTHRESHOLD-LABEL: @reschedule_extract(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @reschedule_extract(			; MINTREESIZE-LABEL: @reschedule_extract(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> poison, float %c0, i32 0			%v0 = insertelement <4 x float> poison, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	Show All 16 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @take_credit(			; NOTHRESHOLD-LABEL: @take_credit(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @take_credit(			; MINTREESIZE-LABEL: @take_credit(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	Show All 40 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @_vadd256(			; NOTHRESHOLD-LABEL: @_vadd256(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @_vadd256(			; MINTREESIZE-LABEL: @_vadd256(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
	; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6			; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
	; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
	; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
	; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
	; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
	; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
	; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
	; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
	; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
	; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
	; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
	; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
	;			;
	%vecext = extractelement <8 x float> %a, i32 0			%vecext = extractelement <8 x float> %a, i32 0
	%vecext1 = extractelement <8 x float> %b, i32 0			%vecext1 = extractelement <8 x float> %b, i32 0
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecext2 = extractelement <8 x float> %a, i32 1			%vecext2 = extractelement <8 x float> %a, i32 1
	%vecext3 = extractelement <8 x float> %b, i32 1			%vecext3 = extractelement <8 x float> %b, i32 1
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	%vecext5 = extractelement <8 x float> %a, i32 2			%vecext5 = extractelement <8 x float> %a, i32 2
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 470 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @reschedule_extract(			; NOTHRESHOLD-LABEL: @reschedule_extract(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @reschedule_extract(			; MINTREESIZE-LABEL: @reschedule_extract(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> undef, float %c0, i32 0			%v0 = insertelement <4 x float> undef, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	Show All 16 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @take_credit(			; NOTHRESHOLD-LABEL: @take_credit(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @take_credit(			; MINTREESIZE-LABEL: @take_credit(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	Show All 40 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @_vadd256(			; NOTHRESHOLD-LABEL: @_vadd256(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @_vadd256(			; MINTREESIZE-LABEL: @_vadd256(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
	; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6			; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
	; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
	; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
	; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
	; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
	; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
	; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
	; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
	; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
	; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
	; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
	; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
	;			;
	%vecext = extractelement <8 x float> %a, i32 0			%vecext = extractelement <8 x float> %a, i32 0
	%vecext1 = extractelement <8 x float> %b, i32 0			%vecext1 = extractelement <8 x float> %b, i32 0
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecext2 = extractelement <8 x float> %a, i32 1			%vecext2 = extractelement <8 x float> %a, i32 1
	%vecext3 = extractelement <8 x float> %b, i32 1			%vecext3 = extractelement <8 x float> %b, i32 1
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	%vecext5 = extractelement <8 x float> %a, i32 2			%vecext5 = extractelement <8 x float> %a, i32 2
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	Show All 14 Lines
	; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> poison, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> poison, [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], poison			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], poison
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], poison			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], poison
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
	; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP11]], i32 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
	; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3
	; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP13]], i32 1
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	;			;
	entry:			entry:
	%0 = load float, float* undef, align 4			%0 = load float, float* undef, align 4
	%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0			%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0
	%1 = load float, float* %x, align 16			%1 = load float, float* %x, align 16
	%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1			%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1
	%2 = load float, float* %y, align 4			%2 = load float, float* %y, align 4
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {			define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load-multiuses(			; CHECK-LABEL: @jumbled-load-multiuses(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP2]], [[TMP10]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3
	%load.2 = load i32, i32* %gep.1, align 4			%load.2 = load i32, i32* %gep.1, align 4
	%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1
	%load.3 = load i32, i32* %gep.2, align 4			%load.3 = load i32, i32* %gep.2, align 4
	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

	Show All 37 Lines
	; CHECK-NEXT: store float [[TMP13]], float* @e, align 4			; CHECK-NEXT: store float [[TMP13]], float* @e, align 4
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1
	; CHECK-NEXT: store float [[TMP14]], float* @f, align 4			; CHECK-NEXT: store float [[TMP14]], float* @f, align 4
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float			; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP16]], <4 x float> [[SHUFFLE]], <4 x i32> <i32 0, i32 1, i32 4, i32 3>
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP17]], i32 2			; CHECK-NEXT: [[TMP18:%.*]] = fsub <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = fsub <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = fadd <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> [[TMP19]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP21:%.*]] = fptosi <4 x float> [[TMP20]] to <4 x i32>
	; CHECK-NEXT: [[TMP22:%.*]] = fptosi <4 x float> [[TMP21]] to <4 x i32>			; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*
	; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*			; CHECK-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP22]], align 4
	; CHECK-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP23]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* @b, align 8			%0 = load i32, i32* @b, align 8
	%arrayidx = getelementptr inbounds i32, i32* %0, i64 4			%arrayidx = getelementptr inbounds i32, i32* %0, i64 4
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12			%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12
	%2 = load i32, i32* %arrayidx1, align 4			%2 = load i32, i32* %arrayidx1, align 4
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4			; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[X0]], i32 0			; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[X0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[I21]], <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 5>
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I21]], float [[TMP4]], i32 3			; CHECK-NEXT: ret <4 x float> [[TMP4]]
	; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%x2 = load float, float* %gep2			%x2 = load float, float* %gep2
	%i0 = insertelement <4 x float> poison, float %x0, i32 0			%i0 = insertelement <4 x float> poison, float %x0, i32 0
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4			; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[X0]], i32 0			; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[X0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[I21]], <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 5>
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I21]], float [[TMP4]], i32 3			; CHECK-NEXT: ret <4 x float> [[TMP4]]
	; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%x2 = load float, float* %gep2			%x2 = load float, float* %gep2
	%i0 = insertelement <4 x float> undef, float %x0, i32 0			%i0 = insertelement <4 x float> undef, float %x0, i32 0
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

	Show All 23 Lines
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP5]], [[WHILE_BODY_LR_PH]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP5]], [[WHILE_BODY_LR_PH]] ]
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
	; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0			; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0
	; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8			; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP9]], <2 x i64> [[TMP6]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[TMP9]], i64 [[TMP10]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = and <2 x i64> [[TMP10]], [[TMP7]]
	; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i64> [[TMP11]], [[TMP7]]			; CHECK-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8			%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8
	%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8			%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8
	br i1 %x, label %while.body.lr.ph, label %while.end			br i1 %x, label %while.body.lr.ph, label %while.end
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP16:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP12]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[TMP14]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP16]] = fadd <4 x float> [[TMP6]], [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP17]], 121
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP16]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP18]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x float> [[TMP16]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP19]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP16]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP20]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP16]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP21]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; Make sure the order of phi nodes of different types does not prevent			; Make sure the order of phi nodes of different types does not prevent
	; vectorization of same typed phi nodes.			; vectorization of same typed phi nodes.
	define float @sort_phi_type(float* nocapture readonly %A) {			define float @sort_phi_type(float* nocapture readonly %A) {
	; CHECK-LABEL: @sort_phi_type(			; CHECK-LABEL: @sort_phi_type(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP2:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2]] = fmul <4 x float> [[TMP1]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP9]] = fmul <4 x float> [[TMP8]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP12]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP5]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP13]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP6]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]			%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP4]], 0
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP4]], 0
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 25 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]			; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 23 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 26 Lines
	; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4			; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4
	; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01			; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01
	; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01			; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01
	; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01			; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01
	; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01			; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[FADD1]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[FADD1]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1
	; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2			; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2
	; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]			; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	Show All 30 Lines
	; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] %StructIn0, i16 [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN0]], i16 [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] %StructIn2, i16 [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN2]], i16 [[TMP7]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0			; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] %StructIn4, i16 [[TMP9]], 1			; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN4]], i16 [[TMP9]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0			; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] %StructIn6, i16 [[TMP11]], 1			; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN6]], i16 [[TMP11]], 1
	; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] %StructIn1, 0			; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In0, [[STRUCT1TY]] %StructIn3, 1			; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN0]], [[STRUCT1TY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] %StructIn5, 0			; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] [[STRUCTIN5]], 0
	; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In2, [[STRUCT1TY]] %StructIn7, 1			; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN2]], [[STRUCT1TY]] [[STRUCTIN7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] %Struct2In1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] [[STRUCT2IN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] %Struct2In3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] [[STRUCT2IN3]], 1
				RKSimonUnsubmitted Not Done Reply Inline Actions These look like NFC changes by the update script that can probably be pre-comitted to reduce the patch? RKSimon: These look like NFC changes by the update script that can probably be pre-comitted to reduce…
	; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0			%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0
	%L0 = load i16, i16 * %GEP0			%L0 = load i16, i16 * %GEP0
	%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1			%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1
	%L1 = load i16, i16 * %GEP1			%L1 = load i16, i16 * %GEP1
	%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2			%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2
	%L2 = load i16, i16 * %GEP2			%L2 = load i16, i16 * %GEP2
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	Show All 10 Lines
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 1
	; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 3
	; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 4			; CHECK-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 4
	; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 5			; CHECK-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 5
	; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <8 x i16> [[LD]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> poison, i16 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]			; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* [[TMP3]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
				lebedev.riUnsubmitted Not Done Reply Inline Actions Thanks! This is clearly an improvement, but these two shuffles are still clearly redundant, because in either case, you end up with 0'th element of `LD` in some elements of output. In this case you could simply drop the first shuffle, and do the second one directly. lebedev.ri: Thanks! This is clearly an improvement, but these two shuffles are still clearly redundant…
				ABataevAuthorUnsubmitted Done Reply Inline Actions I think, in codegen the first shuffle will be simply dropped (this is an identity shuffle). But I'll check what can be improved here. ABataev: I think, in codegen the first shuffle will be simply dropped (this is an identity shuffle). But…
				lebedev.riUnsubmitted Not Done Reply Inline Actions Ignoring more complicated cases, perhaps the key point here is that the `TMP0` is an identity (=>single-source), non-width-changing shuffle, so it can be naturally dropped. `ShuffleVectorInst::isIdentityMask()` might be relevant. lebedev.ri: Ignoring more complicated cases, perhaps the key point here is that the `TMP0` is an…
				ABataevAuthorUnsubmitted Done Reply Inline Actions Agree. Will check what can be done here to improve it. ABataev: Agree. Will check what can be done here to improve it.
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-20'			; YAML-NEXT: - Cost: '-20'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i32> [[TMP11]], <8 x i32> [[TMP11]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 6, i32 11>
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 4			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP10]], i32 1			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[TMP13]], i32 5
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP14]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	%t8 = getelementptr inbounds i32, i32* %t2, i64 1			%t8 = getelementptr inbounds i32, i32* %t2, i64 1
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T40]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> <i32 poison, i32 poison, i32 6270, i32 poison>, i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[T47]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul nsw <4 x i32> [[TMP4]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP10]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i32> [[TMP11]], <8 x i32> [[TMP11]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 6, i32 11>
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP12]], i32 4			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[TMP12]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP10]], i32 1			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T71]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[TMP13]], i32 5
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[TMP14]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	%t5 = load i32, i32* %t4, align 4			%t5 = load i32, i32* %t4, align 4
	%t8 = getelementptr inbounds i32, i32* %t2, i64 1			%t8 = getelementptr inbounds i32, i32* %t2, i64 1
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float			; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef			; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP14:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double			; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = fcmp ogt <4 x double> [[TMP10]], [[TMP4]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = fptrunc <4 x double> [[TMP10]] to <4 x float>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x double> [[TMP11]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP11]], <4 x float> [[TMP2]], <4 x float> [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = fcmp ogt <4 x double> [[TMP13]], [[TMP4]]
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP16:%.*]] = fptrunc <4 x double> [[TMP15]] to <4 x float>
	; CHECK-NEXT: [[TMP17:%.*]] = select <4 x i1> [[TMP14]], <4 x float> [[TMP2]], <4 x float> [[TMP16]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP18]] = phi <4 x float> [ [[TMP17]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]			; CHECK-NEXT: [[TMP14]] = phi <4 x float> [ [[TMP13]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	entry:			entry:
	%conv = uitofp i16 undef to float			%conv = uitofp i16 undef to float
	%sub = fsub float 6.553500e+04, undef			%sub = fsub float 6.553500e+04, undef
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 394242

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_commute.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/extracts-with-undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
ClosedPublic