This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/8
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
transpose-inseltpoison.ll
-
transpose.ll
-
X86/
-
crash_cmpop.ll
-
crash_exceed_scheduling.ll
-
crash_lencod.ll
-
crash_smallpt.ll
-
extractelement.ll
-
horizontal-minmax.ll
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
insert-shuffle.ll
-
jumbled-load-multiuse.ll
-
jumbled-load.ll
-
jumbled_store_crash.ll
-
lookahead.ll
-
ordering-bug.ll
-
phi.ll
-
pr42022-inseltpoison.ll
1
pr42022.ll
-
vec_list_bias-inseltpoison.ll
-
vec_list_bias.ll

Differential D107966

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
ClosedPublic

Authored by ABataev on Aug 12 2021, 8:26 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
anton-afanasyev
dtemirbulatov

Commits

rG2ac5ebedeac4: [SLP]Do not emit extract elements for insertelements users, replace with…
rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with…

Summary

SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	50 ms	x64 windows > MLIR.Dialect/SCF::for-loop-to-while-loop.mlir

Event Timeline

ABataev created this revision.Aug 12 2021, 8:26 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 12 2021, 8:26 AM

ABataev requested review of this revision.Aug 12 2021, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 8:26 AM

Harbormaster completed remote builds in B119264: Diff 366002.Aug 12 2021, 8:54 AM

Rebase

ABataev mentioned this in D108703: [SLP]No need to schedule/check parent for extract{element/value} instruction..Aug 25 2021, 9:54 AM

Harbormaster completed remote builds in B121185: Diff 368666.Aug 25 2021, 10:10 AM

Rebased. Checked that the test SLPVectorizer/X86/remark_extract_broadcast.ll (mentioned in D108703) is updated as requested.

lebedev.ri added a subscriber: lebedev.ri.Aug 26 2021, 5:30 AM

lebedev.ri added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–20 ↗	(On Diff #368855)	Thanks! This is clearly an improvement, but these two shuffles are still clearly redundant, because in either case, you end up with 0'th element of `LD` in some elements of output. In this case you could simply drop the first shuffle, and do the second one directly.

ABataev added inline comments.Aug 26 2021, 5:34 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–20 ↗	(On Diff #368855)	I think, in codegen the first shuffle will be simply dropped (this is an identity shuffle). But I'll check what can be improved here.

lebedev.ri added inline comments.Aug 26 2021, 5:37 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–20 ↗	(On Diff #368855)	Ignoring more complicated cases, perhaps the key point here is that the `TMP0` is an identity (=>single-source), non-width-changing shuffle, so it can be naturally dropped. `ShuffleVectorInst::isIdentityMask()` might be relevant.

ABataev added inline comments.Aug 26 2021, 5:40 AM

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–20 ↗	(On Diff #368855)	Agree. Will check what can be done here to improve it.

Harbormaster completed remote builds in B121328: Diff 368855.Aug 26 2021, 6:25 AM

Address comments

Harbormaster completed remote builds in B121350: Diff 368885.Aug 26 2021, 8:48 AM

Rebase

Harbormaster completed remote builds in B124685: Diff 373619.Sep 20 2021, 9:54 AM

Please can you rebase?

In D107966#3030773, @RKSimon wrote:

Please can you rebase?

Sure, will do, just need to finish my work with other patches.

Rebase

Harbormaster completed remote builds in B126541: Diff 376175.Sep 30 2021, 6:56 AM

Rebase

Harbormaster completed remote builds in B126866: Diff 376954.Oct 4 2021, 12:07 PM

Rebase

Harbormaster completed remote builds in B130809: Diff 382455.Oct 26 2021, 2:21 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:58 PM

Rebase

Harbormaster completed remote builds in B136686: Diff 390691.Nov 30 2021, 7:23 AM

RKSimon added inline comments.Dec 1 2021, 8:44 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5275	The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to simplify this patch.

rebase after D114909?

Rebase

Harbormaster completed remote builds in B137991: Diff 392525.Dec 7 2021, 2:40 PM

Rebase

Harbormaster completed remote builds in B138520: Diff 393275.Dec 9 2021, 2:16 PM

Rebase

Harbormaster completed remote builds in B139070: Diff 394037.Dec 13 2021, 2:47 PM

Rebase

RKSimon added inline comments.Dec 14 2021, 7:24 AM

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll
222	These look like NFC changes by the update script that can probably be pre-comitted to reduce the patch?

It looks OK - but its a LOT of dense code which makes it very difficult to grok - better comments and possibly a simplification pass might be a good idea

Rebase

In D107966#3192290, @RKSimon wrote:

It looks OK - but its a LOT of dense code which makes it very difficult to grok - better comments and possibly a simplification pass might be a good idea

Will try to split it.

Harbormaster completed remote builds in B139221: Diff 394247.Dec 14 2021, 8:25 AM

ABataev mentioned this in D115750: [SLP]Further improvement of the cost model for scalars used in buildvectors..Dec 14 2021, 12:08 PM

Rebase

Harbormaster completed remote builds in B140901: Diff 396532.Dec 29 2021, 8:11 AM

ABataev mentioned this in rG99f31acfce33: [SLP]Further improvement of the cost model for scalars used in buildvectors..May 5 2022, 6:06 AM

rebase? not sure how big this is now

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 9:36 AM

In D107966#3494274, @RKSimon wrote:

rebase? not sure how big this is now

Working on it.

ABataev mentioned this in rGf5d45d70a511: [SLP]Further improvement of the cost model for scalars used in buildvectors..May 11 2022, 6:09 AM

Rebase

Harbormaster completed remote builds in B163936: Diff 428704.May 11 2022, 1:04 PM

RKSimon added inline comments.May 13 2022, 7:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6605	I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to match IEBase? we break after the if() above so we can't get here from there.

ABataev added inline comments.May 13 2022, 7:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6605	We just iterate through insertelements, which are not part of the vectorized buildvector. For example: %0 = insertelement %..., %a, 0 %1 = insertelement %0, %b, 1 %2 = insertelement %1, %c, 2 If %c is vectorized, we start looking through a buildvectror, trying to find the vectorized base. Start from %2. getTreeEntry(%2) returns nullptr. Go to %1. getTreeEntry(%1) returns nullptr too (it is not a part of vectorized buildvector). Go to %0. getTreeEntry(%0) is vectorized and returns E. Iterate through all vectorized insertelements, build a mask. Put %2 to the list of insertelements, which must be transformed to shuffles. Later, we do the analysis of all inserts between %1-%2 (including boundaries), If they must be replaced with shuffles - replace them with shuffles, other insertelements remain as is, just change their base properly to the shuffles.

Rebase

Harbormaster completed remote builds in B164394: Diff 429354.May 13 2022, 4:57 PM

LGTM

This revision is now accepted and ready to land.May 14 2022, 2:40 AM

This revision was landed with ongoing or failed builds.May 20 2022, 6:00 AM

Closed by commit rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with… (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGfc9c59c355cb: [SLP]Do not emit extract elements for insertelements users, replace with….

It looks like this patch is causing SLPVectorizer to crash with the following IR. This blocks building SPEC on X86, so I'll go ahead and revert this for now to unblock testing.

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i64 @foo(ptr %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = sub i32 undef, undef
  %tmp2 = sub nsw i32 undef, %tmp
  %tmp3 = add i32 undef, %tmp2
  %tmp4 = xor i32 %tmp3, undef
  %tmp5 = add i32 undef, %tmp4
  %tmp6 = sub i32 undef, undef
  %tmp7 = load i32, ptr undef, align 4
  %tmp8 = sub i32 %tmp7, undef
  %tmp9 = sub nsw i32 0, undef
  %tmp10 = add nsw i32 %tmp8, %tmp6
  %tmp11 = sub nsw i32 %tmp6, %tmp8
  %tmp12 = add i32 undef, %tmp10
  %tmp13 = xor i32 %tmp12, undef
  %tmp14 = add i32 undef, %tmp9
  %tmp15 = xor i32 %tmp14, undef
  %tmp16 = add i32 undef, %tmp11
  %tmp17 = xor i32 %tmp16, undef
  %tmp18 = add i32 %tmp13, %tmp5
  %tmp19 = add i32 %tmp18, undef
  %tmp20 = add i32 %tmp19, %tmp15
  %tmp21 = add i32 %tmp20, %tmp17
  %tmp22 = sub i32 undef, undef
  %tmp23 = add i32 undef, undef
  %tmp24 = sub i32 undef, undef
  %tmp25 = add nsw i32 %tmp23, undef
  %tmp26 = add nsw i32 %tmp24, %tmp22
  %tmp27 = sub nsw i32 %tmp22, %tmp24
  %tmp28 = add i32 undef, %tmp25
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 undef, %tmp27
  %tmp33 = xor i32 %tmp32, undef
  %tmp34 = add i32 %tmp31, %tmp21
  %tmp35 = add i32 %tmp34, %tmp29
  %tmp36 = add i32 %tmp35, undef
  %tmp37 = add i32 %tmp36, %tmp33
  %tmp38 = sub nsw i32 undef, undef
  %tmp39 = add i32 undef, %tmp38
  %tmp40 = xor i32 %tmp39, undef
  %tmp41 = add i32 undef, %tmp37
  %tmp42 = add i32 %tmp41, 0
  %tmp43 = add i32 %tmp42, %tmp40
  %tmp44 = add i32 %tmp43, undef
  %tmp45 = add i32 undef, %tmp44
  %tmp46 = add i32 %tmp45, undef
  %tmp47 = add i32 %tmp46, undef
  %tmp48 = add i32 %tmp47, 0
  %tmp49 = add i32 undef, %tmp48
  %tmp50 = add i32 %tmp49, undef
  %tmp51 = add i32 %tmp50, undef
  %tmp52 = add i32 %tmp51, 0
  %tmp53 = add i32 undef, %tmp52
  %tmp54 = add i32 %tmp53, undef
  %tmp55 = add i32 %tmp54, undef
  %tmp56 = add i32 %tmp55, 0
  %tmp57 = add i32 0, %tmp56
  %tmp58 = add i32 %tmp57, 0
  %tmp59 = add i32 %tmp58, 0
  %tmp60 = add i32 %tmp59, 0
  %tmp61 = lshr i32 %tmp60, 16
  %tmp62 = add nuw nsw i32 undef, %tmp61
  %tmp63 = sub nsw i32 %tmp62, undef
  %tmp64 = zext i32 %tmp63 to i64
  %tmp65 = shl nuw i64 %tmp64, 32
  %tmp66 = add i64 %tmp65, undef
  ret i64 %tmp66
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

fhahn added a reverting change: rGaeb19817d66f: Revert "[SLP]Do not emit extract elements for insertelements users, replace….May 21 2022, 1:01 PM

ABataev added a commit: rG2ac5ebedeac4: [SLP]Do not emit extract elements for insertelements users, replace with….May 23 2022, 7:09 AM

Unfortunately the latest version is still causing crashes when build SPEC2017 on X86. Reproducer below:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

%struct.hoge = type { [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [4 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i8*, i32, i8*, i32)*], i32 (i8*, i32, i8*, i32, i32*)*, [4 x i64 (i8*, i32)*], [4 x i64 (i8*, i32)*], void (i8*, i32, i8*, i32, [4 x i32]*)*, float ([4 x i32]*, [4 x i32]*, i32)*, [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i32*, i16*, i32, i16*, i16*, i32, i32)*], void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)* }

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = add i32 undef, undef
  %tmp2 = sub i32 undef, undef
  %tmp3 = add i32 undef, undef
  %tmp4 = sub i32 undef, undef
  %tmp5 = add nsw i32 %tmp3, %tmp
  %tmp6 = sub nsw i32 %tmp, %tmp3
  %tmp7 = add nsw i32 %tmp4, %tmp2
  %tmp8 = sub nsw i32 %tmp2, %tmp4
  %tmp9 = add i32 undef, %tmp5
  %tmp10 = xor i32 %tmp9, undef
  %tmp11 = add i32 undef, %tmp7
  %tmp12 = xor i32 %tmp11, undef
  %tmp13 = add i32 undef, %tmp6
  %tmp14 = xor i32 %tmp13, undef
  %tmp15 = add i32 undef, %tmp8
  %tmp16 = xor i32 %tmp15, undef
  %tmp17 = add i32 %tmp12, %tmp10
  %tmp18 = add i32 %tmp17, %tmp14
  %tmp19 = add i32 %tmp18, %tmp16
  %tmp20 = add i32 undef, undef
  %tmp21 = sub i32 undef, undef
  %tmp22 = add i32 undef, undef
  %tmp23 = sub i32 undef, undef
  %tmp24 = add nsw i32 %tmp22, %tmp20
  %tmp25 = sub nsw i32 %tmp20, %tmp22
  %tmp26 = add nsw i32 %tmp23, %tmp21
  %tmp27 = sub nsw i32 %tmp21, %tmp23
  %tmp28 = add i32 undef, %tmp24
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 0, %tmp25
  %tmp33 = xor i32 %tmp32, 0
  %tmp34 = add i32 undef, %tmp27
  %tmp35 = xor i32 %tmp34, undef
  %tmp36 = add i32 %tmp31, %tmp19
  %tmp37 = add i32 %tmp36, %tmp29
  %tmp38 = add i32 %tmp37, %tmp33
  %tmp39 = add i32 %tmp38, %tmp35
  %tmp40 = add i32 undef, undef
  %tmp41 = sub i32 undef, undef
  %tmp42 = add i32 undef, undef
  %tmp43 = sub i32 undef, undef
  %tmp44 = add nsw i32 %tmp42, %tmp40
  %tmp45 = sub nsw i32 %tmp40, %tmp42
  %tmp46 = add nsw i32 %tmp43, %tmp41
  %tmp47 = sub nsw i32 %tmp41, %tmp43
  %tmp48 = add i32 undef, %tmp44
  %tmp49 = xor i32 %tmp48, undef
  %tmp50 = add i32 undef, %tmp46
  %tmp51 = xor i32 %tmp50, undef
  %tmp52 = add i32 undef, %tmp45
  %tmp53 = xor i32 %tmp52, undef
  %tmp54 = add i32 undef, %tmp47
  %tmp55 = xor i32 %tmp54, undef
  %tmp56 = add i32 %tmp51, %tmp39
  %tmp57 = add i32 %tmp56, %tmp49
  %tmp58 = add i32 %tmp57, %tmp53
  %tmp59 = add i32 %tmp58, %tmp55
  %tmp60 = load i32, i32* undef, align 4
  %tmp61 = add i32 undef, %tmp60
  %tmp62 = sub i32 %tmp60, undef
  %tmp63 = add i32 undef, undef
  %tmp64 = sub i32 undef, undef
  %tmp65 = add nsw i32 %tmp63, %tmp61
  %tmp66 = sub nsw i32 %tmp61, %tmp63
  %tmp67 = add nsw i32 %tmp64, %tmp62
  %tmp68 = sub nsw i32 %tmp62, %tmp64
  %tmp69 = add i32 undef, %tmp65
  %tmp70 = xor i32 %tmp69, undef
  %tmp71 = add i32 undef, %tmp67
  %tmp72 = xor i32 %tmp71, undef
  %tmp73 = add i32 undef, %tmp66
  %tmp74 = xor i32 %tmp73, undef
  %tmp75 = add i32 undef, %tmp68
  %tmp76 = xor i32 %tmp75, undef
  %tmp77 = add i32 %tmp72, %tmp59
  %tmp78 = add i32 %tmp77, %tmp70
  %tmp79 = add i32 %tmp78, %tmp74
  %tmp80 = add i32 %tmp79, %tmp76
  %tmp81 = add i32 undef, undef
  %tmp82 = sub i32 undef, undef
  %tmp83 = add i32 undef, undef
  %tmp84 = sub i32 undef, undef
  %tmp85 = add nsw i32 %tmp83, %tmp81
  %tmp86 = sub nsw i32 %tmp81, %tmp83
  %tmp87 = add nsw i32 %tmp84, %tmp82
  %tmp88 = sub nsw i32 %tmp82, %tmp84
  %tmp89 = add i32 undef, %tmp85
  %tmp90 = xor i32 %tmp89, undef
  %tmp91 = add i32 undef, %tmp87
  %tmp92 = xor i32 %tmp91, undef
  %tmp93 = add i32 undef, %tmp86
  %tmp94 = xor i32 %tmp93, undef
  %tmp95 = add i32 undef, %tmp88
  %tmp96 = xor i32 %tmp95, undef
  %tmp97 = add i32 %tmp92, %tmp80
  %tmp98 = add i32 %tmp97, %tmp90
  %tmp99 = add i32 %tmp98, %tmp94
  %tmp100 = add i32 %tmp99, %tmp96
  %tmp101 = add i32 undef, undef
  %tmp102 = sub i32 undef, undef
  %tmp103 = add i32 undef, undef
  %tmp104 = sub i32 undef, undef
  %tmp105 = add nsw i32 %tmp103, %tmp101
  %tmp106 = sub nsw i32 %tmp101, %tmp103
  %tmp107 = add nsw i32 %tmp104, %tmp102
  %tmp108 = sub nsw i32 %tmp102, %tmp104
  %tmp109 = add i32 undef, %tmp105
  %tmp110 = xor i32 %tmp109, undef
  %tmp111 = add i32 undef, %tmp107
  %tmp112 = xor i32 %tmp111, undef
  %tmp113 = add i32 undef, %tmp106
  %tmp114 = xor i32 %tmp113, undef
  %tmp115 = add i32 undef, %tmp108
  %tmp116 = xor i32 %tmp115, undef
  %tmp117 = add i32 %tmp112, %tmp100
  %tmp118 = add i32 %tmp117, %tmp110
  %tmp119 = add i32 %tmp118, %tmp114
  %tmp120 = add i32 %tmp119, %tmp116
  %tmp121 = add i32 undef, undef
  %tmp122 = sub i32 undef, undef
  %tmp123 = add i32 undef, undef
  %tmp124 = sub i32 undef, undef
  %tmp125 = add nsw i32 %tmp123, %tmp121
  %tmp126 = sub nsw i32 %tmp121, %tmp123
  %tmp127 = add nsw i32 %tmp124, %tmp122
  %tmp128 = sub nsw i32 %tmp122, %tmp124
  %tmp129 = add i32 undef, %tmp125
  %tmp130 = xor i32 %tmp129, undef
  %tmp131 = add i32 undef, %tmp127
  %tmp132 = xor i32 %tmp131, undef
  %tmp133 = add i32 undef, %tmp126
  %tmp134 = xor i32 %tmp133, undef
  %tmp135 = add i32 undef, %tmp128
  %tmp136 = xor i32 %tmp135, undef
  %tmp137 = add i32 %tmp132, %tmp120
  %tmp138 = add i32 %tmp137, %tmp130
  %tmp139 = add i32 %tmp138, %tmp134
  %tmp140 = add i32 %tmp139, %tmp136
  %tmp141 = add i32 undef, undef
  %tmp142 = sub i32 undef, undef
  %tmp143 = add i32 undef, undef
  %tmp144 = sub i32 undef, undef
  %tmp145 = add nsw i32 %tmp143, %tmp141
  %tmp146 = sub nsw i32 %tmp141, %tmp143
  %tmp147 = add nsw i32 %tmp144, %tmp142
  %tmp148 = sub nsw i32 %tmp142, %tmp144
  %tmp149 = add i32 undef, %tmp145
  %tmp150 = xor i32 %tmp149, undef
  %tmp151 = add i32 undef, %tmp147
  %tmp152 = xor i32 %tmp151, undef
  %tmp153 = add i32 undef, %tmp146
  %tmp154 = xor i32 %tmp153, undef
  %tmp155 = add i32 undef, %tmp148
  %tmp156 = xor i32 %tmp155, undef
  %tmp157 = add i32 %tmp152, %tmp140
  %tmp158 = add i32 %tmp157, %tmp150
  %tmp159 = add i32 %tmp158, %tmp154
  %tmp160 = add i32 %tmp159, %tmp156
  %tmp161 = and i32 %tmp160, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

In D107966#3533620, @fhahn wrote:

Unfortunately the latest version is still causing crashes when build SPEC2017 on X86. Reproducer below:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

%struct.hoge = type { [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [4 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x i32 (i8*, i32, i8*, i32)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i8*, i32, i8*, i32)*], i32 (i8*, i32, i8*, i32, i32*)*, [4 x i64 (i8*, i32)*], [4 x i64 (i8*, i32)*], void (i8*, i32, i8*, i32, [4 x i32]*)*, float ([4 x i32]*, [4 x i32]*, i32)*, [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i32, i32*)*], [7 x void (i8*, i8*, i8*, i8*, i8*, i32, i32*)*], [7 x i32 (i32*, i16*, i32, i16*, i16*, i32, i32)*], void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)*, void (i8*, i8*, i32*)* }

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp = add i32 undef, undef
  %tmp2 = sub i32 undef, undef
  %tmp3 = add i32 undef, undef
  %tmp4 = sub i32 undef, undef
  %tmp5 = add nsw i32 %tmp3, %tmp
  %tmp6 = sub nsw i32 %tmp, %tmp3
  %tmp7 = add nsw i32 %tmp4, %tmp2
  %tmp8 = sub nsw i32 %tmp2, %tmp4
  %tmp9 = add i32 undef, %tmp5
  %tmp10 = xor i32 %tmp9, undef
  %tmp11 = add i32 undef, %tmp7
  %tmp12 = xor i32 %tmp11, undef
  %tmp13 = add i32 undef, %tmp6
  %tmp14 = xor i32 %tmp13, undef
  %tmp15 = add i32 undef, %tmp8
  %tmp16 = xor i32 %tmp15, undef
  %tmp17 = add i32 %tmp12, %tmp10
  %tmp18 = add i32 %tmp17, %tmp14
  %tmp19 = add i32 %tmp18, %tmp16
  %tmp20 = add i32 undef, undef
  %tmp21 = sub i32 undef, undef
  %tmp22 = add i32 undef, undef
  %tmp23 = sub i32 undef, undef
  %tmp24 = add nsw i32 %tmp22, %tmp20
  %tmp25 = sub nsw i32 %tmp20, %tmp22
  %tmp26 = add nsw i32 %tmp23, %tmp21
  %tmp27 = sub nsw i32 %tmp21, %tmp23
  %tmp28 = add i32 undef, %tmp24
  %tmp29 = xor i32 %tmp28, undef
  %tmp30 = add i32 undef, %tmp26
  %tmp31 = xor i32 %tmp30, undef
  %tmp32 = add i32 0, %tmp25
  %tmp33 = xor i32 %tmp32, 0
  %tmp34 = add i32 undef, %tmp27
  %tmp35 = xor i32 %tmp34, undef
  %tmp36 = add i32 %tmp31, %tmp19
  %tmp37 = add i32 %tmp36, %tmp29
  %tmp38 = add i32 %tmp37, %tmp33
  %tmp39 = add i32 %tmp38, %tmp35
  %tmp40 = add i32 undef, undef
  %tmp41 = sub i32 undef, undef
  %tmp42 = add i32 undef, undef
  %tmp43 = sub i32 undef, undef
  %tmp44 = add nsw i32 %tmp42, %tmp40
  %tmp45 = sub nsw i32 %tmp40, %tmp42
  %tmp46 = add nsw i32 %tmp43, %tmp41
  %tmp47 = sub nsw i32 %tmp41, %tmp43
  %tmp48 = add i32 undef, %tmp44
  %tmp49 = xor i32 %tmp48, undef
  %tmp50 = add i32 undef, %tmp46
  %tmp51 = xor i32 %tmp50, undef
  %tmp52 = add i32 undef, %tmp45
  %tmp53 = xor i32 %tmp52, undef
  %tmp54 = add i32 undef, %tmp47
  %tmp55 = xor i32 %tmp54, undef
  %tmp56 = add i32 %tmp51, %tmp39
  %tmp57 = add i32 %tmp56, %tmp49
  %tmp58 = add i32 %tmp57, %tmp53
  %tmp59 = add i32 %tmp58, %tmp55
  %tmp60 = load i32, i32* undef, align 4
  %tmp61 = add i32 undef, %tmp60
  %tmp62 = sub i32 %tmp60, undef
  %tmp63 = add i32 undef, undef
  %tmp64 = sub i32 undef, undef
  %tmp65 = add nsw i32 %tmp63, %tmp61
  %tmp66 = sub nsw i32 %tmp61, %tmp63
  %tmp67 = add nsw i32 %tmp64, %tmp62
  %tmp68 = sub nsw i32 %tmp62, %tmp64
  %tmp69 = add i32 undef, %tmp65
  %tmp70 = xor i32 %tmp69, undef
  %tmp71 = add i32 undef, %tmp67
  %tmp72 = xor i32 %tmp71, undef
  %tmp73 = add i32 undef, %tmp66
  %tmp74 = xor i32 %tmp73, undef
  %tmp75 = add i32 undef, %tmp68
  %tmp76 = xor i32 %tmp75, undef
  %tmp77 = add i32 %tmp72, %tmp59
  %tmp78 = add i32 %tmp77, %tmp70
  %tmp79 = add i32 %tmp78, %tmp74
  %tmp80 = add i32 %tmp79, %tmp76
  %tmp81 = add i32 undef, undef
  %tmp82 = sub i32 undef, undef
  %tmp83 = add i32 undef, undef
  %tmp84 = sub i32 undef, undef
  %tmp85 = add nsw i32 %tmp83, %tmp81
  %tmp86 = sub nsw i32 %tmp81, %tmp83
  %tmp87 = add nsw i32 %tmp84, %tmp82
  %tmp88 = sub nsw i32 %tmp82, %tmp84
  %tmp89 = add i32 undef, %tmp85
  %tmp90 = xor i32 %tmp89, undef
  %tmp91 = add i32 undef, %tmp87
  %tmp92 = xor i32 %tmp91, undef
  %tmp93 = add i32 undef, %tmp86
  %tmp94 = xor i32 %tmp93, undef
  %tmp95 = add i32 undef, %tmp88
  %tmp96 = xor i32 %tmp95, undef
  %tmp97 = add i32 %tmp92, %tmp80
  %tmp98 = add i32 %tmp97, %tmp90
  %tmp99 = add i32 %tmp98, %tmp94
  %tmp100 = add i32 %tmp99, %tmp96
  %tmp101 = add i32 undef, undef
  %tmp102 = sub i32 undef, undef
  %tmp103 = add i32 undef, undef
  %tmp104 = sub i32 undef, undef
  %tmp105 = add nsw i32 %tmp103, %tmp101
  %tmp106 = sub nsw i32 %tmp101, %tmp103
  %tmp107 = add nsw i32 %tmp104, %tmp102
  %tmp108 = sub nsw i32 %tmp102, %tmp104
  %tmp109 = add i32 undef, %tmp105
  %tmp110 = xor i32 %tmp109, undef
  %tmp111 = add i32 undef, %tmp107
  %tmp112 = xor i32 %tmp111, undef
  %tmp113 = add i32 undef, %tmp106
  %tmp114 = xor i32 %tmp113, undef
  %tmp115 = add i32 undef, %tmp108
  %tmp116 = xor i32 %tmp115, undef
  %tmp117 = add i32 %tmp112, %tmp100
  %tmp118 = add i32 %tmp117, %tmp110
  %tmp119 = add i32 %tmp118, %tmp114
  %tmp120 = add i32 %tmp119, %tmp116
  %tmp121 = add i32 undef, undef
  %tmp122 = sub i32 undef, undef
  %tmp123 = add i32 undef, undef
  %tmp124 = sub i32 undef, undef
  %tmp125 = add nsw i32 %tmp123, %tmp121
  %tmp126 = sub nsw i32 %tmp121, %tmp123
  %tmp127 = add nsw i32 %tmp124, %tmp122
  %tmp128 = sub nsw i32 %tmp122, %tmp124
  %tmp129 = add i32 undef, %tmp125
  %tmp130 = xor i32 %tmp129, undef
  %tmp131 = add i32 undef, %tmp127
  %tmp132 = xor i32 %tmp131, undef
  %tmp133 = add i32 undef, %tmp126
  %tmp134 = xor i32 %tmp133, undef
  %tmp135 = add i32 undef, %tmp128
  %tmp136 = xor i32 %tmp135, undef
  %tmp137 = add i32 %tmp132, %tmp120
  %tmp138 = add i32 %tmp137, %tmp130
  %tmp139 = add i32 %tmp138, %tmp134
  %tmp140 = add i32 %tmp139, %tmp136
  %tmp141 = add i32 undef, undef
  %tmp142 = sub i32 undef, undef
  %tmp143 = add i32 undef, undef
  %tmp144 = sub i32 undef, undef
  %tmp145 = add nsw i32 %tmp143, %tmp141
  %tmp146 = sub nsw i32 %tmp141, %tmp143
  %tmp147 = add nsw i32 %tmp144, %tmp142
  %tmp148 = sub nsw i32 %tmp142, %tmp144
  %tmp149 = add i32 undef, %tmp145
  %tmp150 = xor i32 %tmp149, undef
  %tmp151 = add i32 undef, %tmp147
  %tmp152 = xor i32 %tmp151, undef
  %tmp153 = add i32 undef, %tmp146
  %tmp154 = xor i32 %tmp153, undef
  %tmp155 = add i32 undef, %tmp148
  %tmp156 = xor i32 %tmp155, undef
  %tmp157 = add i32 %tmp152, %tmp140
  %tmp158 = add i32 %tmp157, %tmp150
  %tmp159 = add i32 %tmp158, %tmp154
  %tmp160 = add i32 %tmp159, %tmp156
  %tmp161 = and i32 %tmp160, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }

Ho Florian, tried to reproduce, was unable to do it:

opt -slp-vectorizer -S ./repro1.ll
; ModuleID = './repro1.ll'
source_filename = "./repro1.ll"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i64 @quux.51(i8* %arg, i32 %arg1) unnamed_addr #0 {
bb:
  %tmp60 = load i32, i32* undef, align 4
  %0 = insertelement <32 x i32> poison, i32 %tmp60, i32 0
  %shuffle = shufflevector <32 x i32> %0, <32 x i32> poison, <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %1 = add <32 x i32> %shuffle, poison
  %2 = sub <32 x i32> %shuffle, poison
  %3 = shufflevector <32 x i32> %1, <32 x i32> %2, <32 x i32> <i32 0, i32 33, i32 2, i32 35, i32 36, i32 5, i32 6, i32 39, i32 40, i32 9, i32 10, i32 43, i32 44, i32 13, i32 14, i32 47, i32 48, i32 17, i32 18, i32 51, i32 52, i32 21, i32 22, i32 55, i32 56, i32 25, i32 26, i32 59, i32 60, i32 29, i32 30, i32 63>
  %4 = shufflevector <32 x i32> %3, <32 x i32> poison, <32 x i32> <i32 2, i32 3, i32 0, i32 1, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12, i32 19, i32 18, i32 17, i32 16, i32 23, i32 22, i32 21, i32 20, i32 27, i32 26, i32 25, i32 24, i32 31, i32 30, i32 29, i32 28>
  %5 = add nsw <32 x i32> %3, %4
  %6 = sub nsw <32 x i32> %3, %4
  %7 = shufflevector <32 x i32> %5, <32 x i32> %6, <32 x i32> <i32 0, i32 1, i32 34, i32 35, i32 4, i32 5, i32 38, i32 39, i32 8, i32 9, i32 42, i32 43, i32 12, i32 13, i32 46, i32 47, i32 16, i32 17, i32 50, i32 51, i32 20, i32 21, i32 54, i32 55, i32 24, i32 25, i32 58, i32 59, i32 28, i32 29, i32 62, i32 63>
  %8 = add <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>, %7
  %9 = xor <32 x i32> %8, <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %10 = call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %9)
  %tmp161 = and i32 %10, 65535
  %tmp162 = add nuw nsw i32 %tmp161, undef
  %tmp163 = sub nsw i32 %tmp162, undef
  %tmp164 = zext i32 %tmp163 to i64
  %tmp165 = shl nuw i64 %tmp164, 32
  %tmp166 = add i64 %tmp165, undef
  ret i64 %tmp166
}

; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

attributes #0 = { "target-features"="+64bit,+adx,+aes,+avx,+avx2" }
attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

Could you check one more time, please?

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Checked on the debug build, will check with rel+assert

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Still unable to reproduce but I'll try to investigate it.

In D107966#3533961, @ABataev wrote:

In D107966#3533907, @fhahn wrote:

In D107966#3533879, @ABataev wrote:

Could you check one more time, please?

Yeah I just checked and this crashes for me with a release + assert build (commit is 96323c9f4c10bef5cb5d527970cabc73eab8aa21)

The assertion is: Assertion failed: (II && "Must be an insertelement instruction."), function vectorizeTree, file SLPVectorizer.cpp, line 8543.

Still unable to reproduce but I'll try to investigate it.

I'm building on macOS which defaults to using libc++. It's possible that this may be the reason why you are not seeing the crash. I left an inline comment for a sort call. Replacing this with stable_sort fixes the crash.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6785	Is it possible that the relative order of elements that compare as equal matters in the code below? With stable_sort, I am not seeing the crash.

ABataev added inline comments.May 24 2022, 5:36 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6785	Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there should not be difference between sort and stable sort results.

ABataev added inline comments.May 24 2022, 6:09 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6785	Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58?

fhahn added inline comments.May 27 2022, 5:03 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6785	It doesn't crash any longer, thanks!

ABataev added inline comments.May 27 2022, 5:27 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6785	Great!

Our SPEC build on PowerPC failed due to this patch. Following PR (gd.ll) is extracted from gcc_r build:

target datalayout = "E-m:a-p:32:32-i64:64-n32"
target triple = "powerpc-ibm-aix7.2.0.0"

%union.tree_node = type { %struct.tree_optimization_option }
%struct.tree_optimization_option = type { %struct.tree_common, %struct.cl_optimization }
%struct.tree_common = type { %struct.tree_base, %union.tree_node*, %union.tree_node* }
%struct.tree_base = type { i64 }
%struct.cl_optimization = type { i32 }
%struct.c_declarator = type { i32, %struct.c_declarator*, i32, %union.anon.1 }
%union.anon.1 = type { %struct.anon.443 }
%struct.anon.443 = type { %union.tree_node*, i32, %union.tree_node*, i8 }
%struct.c_declspecs = type { %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, i32, i32, i8, i32, i16, i8 }

@flag_isoc99 = internal unnamed_addr global i1 false, align 4
@pedantic = internal global i32 0, align 4

; Function Attrs: nounwind
define fastcc %union.tree_node* @grokdeclarator(%struct.c_declarator* noundef readonly %declarator, %struct.c_declspecs* nocapture noundef %declspecs) unnamed_addr #0 {
entry:
  %type = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 0
  %thread_p = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 8
  %p0 = bitcast %struct.c_declarator* %declarator to i64*
  %t0 = load i64, i64* %p0, align 8
  %cmp00 = icmp eq i64 %t0, 0
  br i1 %cmp00, label %if.end10, label %cleanup

if.end10:                                         ; preds = %entry
  %t1 = load %union.tree_node*, %union.tree_node** %type, align 4
  %t2 = getelementptr %union.tree_node, %union.tree_node* %t1, i32 0, i32 0, i32 0, i32 0, i32 0
  %bf.load1 = load i64, i64* %t2, align 8
  %bf.lshr.mask5.i = and i64 %bf.load1, -281474976710656
  %cmp10 = icmp eq i64 %bf.lshr.mask5.i, 4222124650659840
  %extract.t814 = trunc i64 %bf.load1 to i8
  %extract.t817 = trunc i64 %bf.load1 to i32
  %extract819 = lshr i64 %bf.load1, 43
  %extract.t820 = trunc i64 %extract819 to i32
  %extract823 = lshr i64 %bf.load1, 44
  %extract.t824 = trunc i64 %extract823 to i32
  br i1 %cmp10, label %if.then20, label %if.else20

if.then20:                                        ; preds = %if.end10
  %type1.i33 = getelementptr inbounds %union.tree_node, %union.tree_node* %t1, i32 0, i32 0, i32 0, i32 2
  %t3 = load %union.tree_node*, %union.tree_node** %type1.i33, align 4
  %t4 = getelementptr %union.tree_node, %union.tree_node* %t3, i32 0, i32 0, i32 0, i32 0, i32 0
  %bf.load2 = load i64, i64* %t4, align 8
  %extract.t = trunc i64 %bf.load2 to i8
  %extract.t816 = trunc i64 %bf.load2 to i32
  %extract = lshr i64 %bf.load2, 43
  %extract.t818 = trunc i64 %extract to i32
  %extract821 = lshr i64 %bf.load2, 44
  %extract.t822 = trunc i64 %extract821 to i32
  br label %if.else20

if.else20:                                        ; preds = %if.then20, %if.end10
  %bf.load.off0 = phi i8 [ %extract.t, %if.then20 ], [ %extract.t814, %if.end10 ]
  %bf.load.off0815 = phi i32 [ %extract.t816, %if.then20 ], [ %extract.t817, %if.end10 ]
  %bf.load.off43 = phi i32 [ %extract.t818, %if.then20 ], [ %extract.t820, %if.end10 ]
  %bf.load.off44 = phi i32 [ %extract.t822, %if.then20 ], [ %extract.t824, %if.end10 ]
  %type.addr.0.lcssa.i = phi %union.tree_node* [ %t3, %if.then20 ], [ %t1, %if.end10 ]
  %p5 = getelementptr inbounds %union.tree_node, %union.tree_node* %type.addr.0.lcssa.i, i32 0, i32 0, i32 1, i32 0
  %p9 = getelementptr inbounds %struct.c_declspecs, %struct.c_declspecs* %declspecs, i32 0, i32 9
  %bf.load154 = load i16, i16* %thread_p, align 4
  %bf.lshr155 = lshr i16 %bf.load154, 7
  %bf.clear156 = and i16 %bf.lshr155, 1
  %bf.cast157 = zext i16 %bf.clear156 to i32
  %bf.cast162 = and i32 %bf.load.off43, 1
  %add = add nuw nsw i32 %bf.cast162, %bf.cast157
  %bf.load168 = load i32, i32* %p5, align 4
  %bf.lshr169 = lshr i32 %bf.load168, 18
  %t6 = insertelement <2 x i16> poison, i16 %bf.load154, i64 0
  %t7 = shufflevector <2 x i16> %t6, <2 x i16> poison, <2 x i32> zeroinitializer
  %t8 = lshr <2 x i16> %t7, <i16 5, i16 6>
  %t9 = and <2 x i16> %t8, <i16 1, i16 1>
  %t10 = zext <2 x i16> %t9 to <2 x i32>
  %t11 = insertelement <2 x i32> poison, i32 %bf.lshr169, i64 0
  %t12 = insertelement <2 x i32> %t11, i32 %bf.load.off44, i64 1
  %t13 = and <2 x i32> %t12, <i32 1, i32 1>
  %t14 = add nuw nsw <2 x i32> %t13, %t10
  %t15 = load i8, i8* %p9, align 2
  %conv188 = zext i8 %t15 to i32
  %cmp20 = icmp eq i8 %t15, 0
  %conv192 = and i32 %bf.load.off0815, 255
  %cond196 = select i1 %cmp20, i32 %bf.load.off0815, i32 %conv188
  %t16 = load i32, i32* @pedantic, align 4
  %cmp30 = icmp eq i32 %t16, 0
  %.b28 = load i1, i1* @flag_isoc99, align 4
  %t17 = insertelement <2 x i1> poison, i1 %cmp20, i64 0
  %t18 = insertelement <2 x i1> %t17, i1 %cmp30, i64 1
  %t19 = zext <2 x i1> %t18 to <2 x i64>
  %or.cond1969 = select i1 %cmp30, i1 true, i1 %.b28
  br i1 %or.cond1969, label %cleanup, label %if.else30

if.else30:                                        ; preds = %if.else20
  %cmp40 = icmp ugt i32 %add, 1
  br i1 %cmp40, label %if.then40, label %if.end40

if.then40:                                        ; preds = %if.else30
  br label %if.end40

if.end40:                                         ; preds = %if.then40, %if.else30
  %t20 = extractelement <2 x i32> %t14, i64 0
  %cmp50 = icmp ugt i32 %t20, 1
  br i1 %cmp50, label %if.then50, label %if.end50

if.then50:                                        ; preds = %if.end40
  br label %if.end50

if.end50:                                         ; preds = %if.then50, %if.end40
  br label %cleanup

cleanup:                                          ; preds = %if.end50, %if.else20, %entry
  ret %union.tree_node* null
}

attributes #0 = { nounwind "approx-func-fp-math"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="pwr10" "target-features"="+altivec,+bpermd,+crbits,+crypto,+direct-move,+extdiv,+isa-v206-instructions,+isa-v207-instructions,+isa-v30-instructions,+isa-v31-instructions,+mma,+paired-vector-memops,+pcrelative-memops,+power10-vector,+power8-vector,+power9-vector,+prefix-instrs,+vsx,-htm,-privileged,-quadword-atomics,-rop-protect,-spe" }

Here is the dumping with the latest SLPVectorizer.cpp (up to June 16). To reproduce,

opt  -slp-vectorizer gd.ll

opt: llvm/main/llvm-project/llvm/lib/IR/Instructions.cpp:2012: llvm::ShuffleVectorInst::ShuffleVectorInst(llvm::Value *, llvm::Value *, ArrayRef<int>, const llvm::Twine &, llvm::Instruction *): Assertion `isValidOperands(V1, V2, Mask) && "Invalid shuffle vector instruction operands!"' failed.

Stack dump:
0. Program arguments: llvm/main/build/bin/opt -slp-vectorizer gd.ll
#0 0x0000000012ea16d4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (llvm/main/build/bin/opt+0x12ea16d4)
#1 0x0000000012ea1af4 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
#2 0x0000000012e9e818 llvm::sys::RunSignalHandlers() (llvm/main/build/bin/opt+0x12e9e818)
#3 0x0000000012ea1dbc SignalHandler(int) Signals.cpp:0:0
#4 0x00007d17768b04c8 (linux-vdso64.so.1+0x4c8)
#5 0x00007d1776130468 libc_signal_restore_set /build/glibc-tRXAGY/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
#6 0x00007d1776130468 raise /build/glibc-tRXAGY/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3
#7 0x00007d1776107cd0 abort /build/glibc-tRXAGY/glibc-2.31/stdlib/abort.c:79:7
#8 0x00007d177611f5dc assert_fail_base /build/glibc-tRXAGY/glibc-2.31/assert/assert.c:92:3
#9 0x00007d177611f680 __assert_fail /build/glibc-tRXAGY/glibc-2.31/assert/assert.c:101:3
#10 0x00000000124870cc llvm::ShuffleVectorInst::ShuffleVectorInst(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>, llvm::Twine const&, llvm::Instruction*) (llvm/main/build/bin/opt+0x124870cc)
#11 0x000000001064b62c llvm::IRBuilderBase::CreateShuffleVector(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>, llvm::Twine const&) (llvm/main/build/bin/opt+0x1064b62c)
#12 0x000000001318a698 llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>>>>&)::$_69::operator()(llvm::Value*, llvm::Value*, llvm::ArrayRef<int>) const SLPVectorizer.cpp:0:0
#13 0x000000001314098c llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int>>, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>>>>>&) (llvm/main/build/bin/opt+0x1314098c)
#14 0x0000000013150de0 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (llvm/main/build/bin/opt+0x13150de0)
......

Can you please take a look? Thanks!

reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!

%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2

opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

In D107966#3620735, @vdmitrie wrote:
reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!
%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2
opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

Hi Valery, thanks for the report, will prepare the fix later today or tomorrow

In D107966#3620735, @vdmitrie wrote:
reduced.ll12 KBDownload

patch.txt1 KBDownload

There is another issue which I tracked down to this patch but it is kind of hidden. In order to reveal the issue please apply attached patch ( that is basically enabling expensive checks and added verifyFunction right after vectorized code generated.

Crash looks like this:
Instruction does not dominate all uses!
%41 = insertelement <4 x i32> %40, i32 %32, i32 1
%39 = insertelement <4 x i32> %41, i32 poison, i32 2
opt: /path/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8404: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(): Assertion `!verifyFunction(*F, &dbgs()) && "Broken after vec"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: bin/opt -slp-vectorizer -mcpu=skylake -disable-output reduced.ll

Investigated. This is not quite a bug, but some junk is left that requires cleanup. I'll add the code to do this extra cleanup to avoid any problems, plus, I believe it may improve compile time in some cases.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

463 lines

test/

Transforms/

SLPVectorizer/

AArch64/

transpose-inseltpoison.ll

2 lines

transpose.ll

2 lines

X86/

crash_cmpop.ll

2 lines

crash_exceed_scheduling.ll

2 lines

9 lines

15 lines

24 lines

4 lines

insert-element-build-vector-inseltpoison.ll

92 lines

insert-element-build-vector.ll

92 lines

insert-shuffle.ll

14 lines

jumbled-load-multiuse.ll

7 lines

jumbled-load.ll

17 lines

jumbled_store_crash.ll

15 lines

lookahead.ll

19 lines

ordering-bug.ll

7 lines

phi.ll

60 lines

pr42022-inseltpoison.ll

14 lines

pr42022.ll

54 lines

vec_list_bias-inseltpoison.ll

10 lines

vec_list_bias.ll

10 lines

Diff 373619

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,055 Lines • ▼ Show 20 Lines	for (Instruction *Inst : OrderedScalars) {
}		}

PrevInst = Inst;		PrevInst = Inst;
}		}

return Cost;		return Cost;
}		}

		/// Check if two insertelement instructions are from the same buildvector.
		static bool areTwoInsertFromSameBuildVector(InsertElementInst *VU,
		InsertElementInst *V) {
		// Instructions must be from the same basic blocks.
		if (VU->getParent() != V->getParent())
		return false;
		// Checks if 2 insertelements are from the same buildvector.
		if (VU->getType() != V->getType())
		return false;
		// Multiple used inserts are separate nodes.
		if (!VU->hasOneUse() && !V->hasOneUse())
		return false;
		auto *IE1 = VU;
		auto *IE2 = V;
		// Go though of insertelement instructions trying to find either VU as
		// the original vector for IE2 or V as the original vector for IE1.
		do {
		if (IE2 == VU \|\| IE1 == V)
		return true;
		if (IE1) {
		if (IE1 != VU && !IE1->hasOneUse())
		IE1 = nullptr;
		else
		IE1 = dyn_cast<InsertElementInst>(IE1->getOperand(0));
		}
		if (IE2) {
		if (IE2 != V && !IE2->hasOneUse())
		IE2 = nullptr;
		else
		IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));
		}
		} while (IE1 \|\| IE2);
		return false;
		}

		/// Checks if the \p IE1 instructions is followed by \p IE2 instruction in the
		/// buildvector sequence.
		static bool isFirstInsertElement(const InsertElementInst *IE1,
		const InsertElementInst *IE2) {
		const auto *I1 = IE1;
		const auto *I2 = IE2;
		do {
		if (I2 == IE1)
		return true;
		if (I1 == IE2)
		return false;
		if (I1)
		I1 = dyn_cast<InsertElementInst>(I1->getOperand(0));
		if (I2)
		I2 = dyn_cast<InsertElementInst>(I2->getOperand(0));
		} while (I1 \|\| I2);
		llvm_unreachable("Two different buildvectors not expected.");
		}

		namespace {
		/// Returns incoming Value , if the requested type is Value too, or a default
		/// value, otherwise.
		struct ValueSelect {
		template <typename U>
		static typename std::enable_if<std::is_same<Value , U>::value, Value >::type
		get(Value *V) {
		return V;
		}
		template <typename U>
		static typename std::enable_if<!std::is_same<Value *, U>::value, U>::type
		get(Value *) {
		return U();
		}
		};
		} // namespace

		template <typename T>
		static T *performExtractsShuffleAction(
		MutableArrayRef<std::pair<T , SmallVector<int>>> ShuffleMask, Value Base,
		function_ref<unsigned(T *)> GetVF,
		function_ref<std::pair<T , bool>(T , ArrayRef<int>)> ResizeAction,
		function_ref<T (ArrayRef<int>, ArrayRef<T >)> Action) {
		assert(!ShuffleMask.empty() && "Empty list of shuffles for inserts.");
		SmallVector<int> Mask(ShuffleMask.begin()->second);
		auto VMIt = std::next(ShuffleMask.begin());
		T *Prev = nullptr;
		bool IsBaseNotUndef = !isa<UndefValue>(Base);
		if (IsBaseNotUndef) {
		// Base is not undef, need to combine it with the next subvectors.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		for (unsigned Idx = 0, VF = Mask.size(); Idx < VF; ++Idx) {
		if (Mask[Idx] == UndefMaskElem)
		Mask[Idx] = Idx;
		else
		Mask[Idx] = (Res.second ? Idx : Mask[Idx]) + VF;
		}
		auto V = ValueSelect::get<T >(Base);
		assert((!V \|\| GetVF(V) == Mask.size()) &&
		"Expected base vector of VF number of elements.");
		Prev = Action(Mask, {V, Res.first});
		} else if (ShuffleMask.size() == 1) {
		// Base is undef and only 1 vector is shuffled.
		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
		if (Res.second)
		Prev = Res.first;
		else
		Prev = Action(Mask, {ShuffleMask.begin()->first});
		} else {
		// Base is undef and at least 2 input vectors shuffled.
		unsigned Vec1VF = GetVF(ShuffleMask.begin()->first);
		unsigned Vec2VF = GetVF(VMIt->first);
		if (Vec1VF == Vec2VF) {
		// No need to resize the input vectors since they are of the same size, we
		// can shuffle them directly.
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = SecMask[I] + Vec1VF;
		}
		}
		Prev = Action(Mask, {ShuffleMask.begin()->first, VMIt->first});
		} else {
		// Vectors of different sizes - resize and reshuffle.
		std::pair<T *, bool> Res1 =
		ResizeAction(ShuffleMask.begin()->first, Mask);
		std::pair<T *, bool> Res2 = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (Mask[I] != UndefMaskElem) {
		assert(SecMask[I] == UndefMaskElem && "Multiple uses of scalars.");
		if (Res1.second)
		Mask[I] = I;
		} else if (SecMask[I] != UndefMaskElem) {
		assert(Mask[I] == UndefMaskElem && "Multiple uses of scalars.");
		Mask[I] = (Res2.second ? I : SecMask[I]) + VF;
		}
		}
		Prev = Action(Mask, {Res1.first, Res2.first});
		}
		VMIt = std::next(VMIt);
		}
		for (auto E = ShuffleMask.end(); VMIt != E; ++VMIt) {
		// Shuffle other input vectors, if any.
		std::pair<T *, bool> Res = ResizeAction(VMIt->first, VMIt->second);
		ArrayRef<int> SecMask = VMIt->second;
		for (unsigned I = 0, VF = Mask.size(); I < VF; ++I) {
		if (SecMask[I] != UndefMaskElem) {
		assert((Mask[I] == UndefMaskElem \|\| IsBaseNotUndef) &&
		"Multiple uses of scalars.");
		Mask[I] = (Res.second ? I : SecMask[I]) + VF;
		} else if (Mask[I] != UndefMaskElem) {
		Mask[I] = I;
		}
		}
		Prev = Action(Mask, {Prev, Res.first});
		}
		return Prev;
		}

InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {		InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "		LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
<< VectorizableTree.size() << ".\n");		<< VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0]->Scalars.size();		unsigned BundleWidth = VectorizableTree[0]->Scalars.size();

for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {		for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();		TreeEntry &TE = *VectorizableTree[I].get();

InstructionCost C = getEntryCost(&TE, VectorizedVals);		InstructionCost C = getEntryCost(&TE, VectorizedVals);
Cost += C;		Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n"		<< ".\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
InstructionCost ExtractCost = 0;		InstructionCost ExtractCost = 0;
SmallVector<unsigned> VF;		SmallVector<MapVector<const TreeEntry *, SmallVector<int>>> ShuffleMasks;
SmallVector<SmallVector<int>> ShuffleMask;		SmallVector<std::pair<Value , const TreeEntry >> FirstUsers;
SmallVector<Value *> FirstUsers;
SmallVector<APInt> DemandedElts;		SmallVector<APInt> DemandedElts;
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!ExtractCostCalculated.insert(EU.Scalar).second)		if (!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
// removed prior to code generation, and so the extraction will be		// removed prior to code generation, and so the extraction will be
Show All 12 Lines	for (ExternalUser &EU : ExternalUses) {

// If found user is an insertelement, do not calculate extract cost but try		// If found user is an insertelement, do not calculate extract cost but try
// to detect it as a final shuffled/identity match.		// to detect it as a final shuffled/identity match.
if (EU.User && isa<InsertElementInst>(EU.User)) {		if (EU.User && isa<InsertElementInst>(EU.User)) {
if (auto *FTy = dyn_cast<FixedVectorType>(EU.User->getType())) {		if (auto *FTy = dyn_cast<FixedVectorType>(EU.User->getType())) {
Optional<int> InsertIdx = getInsertIndex(EU.User, 0);		Optional<int> InsertIdx = getInsertIndex(EU.User, 0);
if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
continue;		continue;
Value *VU = EU.User;		auto *VU = cast<InsertElementInst>(EU.User);
auto It = find_if(FirstUsers, [VU](Value V) {		const TreeEntry *ScalarTE = getTreeEntry(EU.Scalar);
// Checks if 2 insertelements are from the same buildvector.		auto *It =
if (VU->getType() != V->getType())		find_if(FirstUsers,
return false;		[VU](const std::pair<Value , const TreeEntry > &Pair) {
auto *IE1 = cast<InsertElementInst>(VU);		return areTwoInsertFromSameBuildVector(
auto *IE2 = cast<InsertElementInst>(V);		VU, cast<InsertElementInst>(Pair.first));
		RKSimonUnsubmitted Not Done Reply Inline Actions The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to simplify this patch. RKSimon: The (almost NFC) change to areTwoInsertFromSameBuildVector looks it can be pulled out to…
// Go though of insertelement instructions trying to find either VU as
// the original vector for IE2 or V as the original vector for IE1.
do {
if (IE1 == VU \|\| IE2 == V)
return true;
if (IE1)
IE1 = dyn_cast<InsertElementInst>(IE1->getOperand(0));
if (IE2)
IE2 = dyn_cast<InsertElementInst>(IE2->getOperand(0));
} while (IE1 \|\| IE2);
return false;
});		});
int VecId = -1;		int VecId = -1;
if (It == FirstUsers.end()) {		if (It == FirstUsers.end()) {
VF.push_back(FTy->getNumElements());		(void)ShuffleMasks.emplace_back();
ShuffleMask.emplace_back(VF.back(), UndefMaskElem);		FirstUsers.emplace_back(VU, ScalarTE);
FirstUsers.push_back(EU.User);		DemandedElts.push_back(APInt::getZero(FTy->getNumElements()));
DemandedElts.push_back(APInt::getZero(VF.back()));
VecId = FirstUsers.size() - 1;		VecId = FirstUsers.size() - 1;
} else {		} else {
		if (isFirstInsertElement(VU, cast<InsertElementInst>(It->first)))
		It->first = VU;
VecId = std::distance(FirstUsers.begin(), It);		VecId = std::distance(FirstUsers.begin(), It);
}		}
int Idx = *InsertIdx;		int Idx = *InsertIdx;
ShuffleMask[VecId][Idx] = EU.Lane;		SmallVectorImpl<int> &Mask = ShuffleMasks[VecId][ScalarTE];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		assert(Mask[Idx] == UndefMaskElem &&
		"InsertElementInstruction used already.");
		Mask[Idx] = EU.Lane;
DemandedElts[VecId].setBit(Idx);		DemandedElts[VecId].setBit(Idx);
}		}
}		}

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto Extend =		auto Extend =
MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;		MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
VecTy = FixedVectorType::get(MinTy, BundleWidth);		VecTy = FixedVectorType::get(MinTy, BundleWidth);
ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),		ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),
VecTy, EU.Lane);		VecTy, EU.Lane);
} else {		} else {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

InstructionCost SpillCost = getSpillCost();		InstructionCost SpillCost = getSpillCost();
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
for (int I = 0, E = FirstUsers.size(); I < E; ++I) {		// FIXME: Shall be replaced by GetVF function once non-power-2 patch is
// For the very first element - simple shuffle of the source vector.		// landed.
int Limit = ShuffleMask[I].size() * 2;		auto &&GetVF = [](const TreeEntry *TE) {
if (I == 0 &&		if (!TE->ReuseShuffleIndices.empty())
all_of(ShuffleMask[I], [Limit](int Idx) { return Idx < Limit; }) &&		return TE->ReuseShuffleIndices.size();
!ShuffleVectorInst::isIdentityMask(ShuffleMask[I])) {		return TE->Scalars.size();
InstructionCost C = TTI->getShuffleCost(		};
		auto &&ResizeToVF = [this, &GetVF, &Cost](const TreeEntry *TE,
		ArrayRef<int> Mask) {
		InstructionCost C = 0;
		unsigned VF = Mask.size();
		unsigned VecVF = GetVF(TE);
		if (VF != VecVF &&
		(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) \|\|
		(all_of(Mask,
		[VF](int Idx) { return Idx < 2 * static_cast<int>(VF); }) &&
		!ShuffleVectorInst::isIdentityMask(Mask)))) {
		SmallVector<int> OrigMask(VecVF, UndefMaskElem);
		std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
		OrigMask.begin());
		C = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc,		TTI::SK_PermuteSingleSrc,
cast<FixedVectorType>(FirstUsers[I]->getType()), ShuffleMask[I]);		FixedVectorType::get(TE->getMainOp()->getType(), VecVF), OrigMask);
		LLVM_DEBUG(
		dbgs() << "SLP: Adding cost " << C
		<< " for final shuffle of insertelement external users.\n";
		TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
		Cost += C;
		return std::make_pair(TE, true);
		}
		return std::make_pair(TE, false);
		};
		// Calculate the cost of the reshuffled vectors, if any.
		for (int I = 0, E = FirstUsers.size(); I < E; ++I) {
		Value *Base = cast<Instruction>(FirstUsers[I].first)->getOperand(0);
		unsigned VF = ShuffleMasks[I].begin()->second.size();
		auto *FTy = FixedVectorType::get(
		cast<VectorType>(FirstUsers[I].first->getType())->getElementType(), VF);
		auto Vector = ShuffleMasks[I].takeVector();
		(void)performExtractsShuffleAction<const TreeEntry>(
		makeMutableArrayRef(Vector.data(), Vector.size()), Base, GetVF,
		ResizeToVF,
		[this, FTy, &Cost](ArrayRef<int> Mask,
		ArrayRef<const TreeEntry *> TEs) {
		assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
		"Expected exactly 1 or 2 tree entries.");
		if (TEs.size() == 1) {
		int Limit = 2 * Mask.size();
		if (!all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) \|\|
		!ShuffleVectorInst::isIdentityMask(Mask)) {
		InstructionCost C =
		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of insertelement external users "		<< " for final shuffle of insertelement "
<< *VectorizableTree.front()->Scalars.front() << ".\n"		"external users.\n";
		TEs.front()->dump();
		dbgs()
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
continue;
}		}
// Other elements - permutation of 2 vectors (the initial one and the next		} else {
// Ith incoming vector).		InstructionCost C =
unsigned VF = ShuffleMask[I].size();		TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, FTy, Mask);
for (unsigned Idx = 0; Idx < VF; ++Idx) {		LLVM_DEBUG(dbgs()
int &Mask = ShuffleMask[I][Idx];
Mask = Mask == UndefMaskElem ? Idx : VF + Mask;
}
InstructionCost C = TTI->getShuffleCost(
TTI::SK_PermuteTwoSrc, cast<FixedVectorType>(FirstUsers[I]->getType()),
ShuffleMask[I]);
LLVM_DEBUG(
dbgs()
<< "SLP: Adding cost " << C		<< "SLP: Adding cost " << C
<< " for final shuffle of vector node and external insertelement users "		<< " for final shuffle of vector node and external "
<< *VectorizableTree.front()->Scalars.front() << ".\n"		"insertelement users.\n";
<< "SLP: Current total cost = " << Cost << "\n");		TEs.front()->dump(); TEs.back()->dump();
		dbgs() << "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
		}
		return TEs.back();
		});
InstructionCost InsertCost = TTI->getScalarizationOverhead(		InstructionCost InsertCost = TTI->getScalarizationOverhead(
cast<FixedVectorType>(FirstUsers[I]->getType()), DemandedElts[I],		cast<FixedVectorType>(FirstUsers[I].first->getType()), DemandedElts[I],
/Insert/ true,		/Insert/ true, /Extract/ false);
/Extract/ false);
Cost -= InsertCost;		Cost -= InsertCost;
LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost		LLVM_DEBUG(dbgs() << "SLP: subtracting the cost " << InsertCost
<< " for insertelements gather.\n"		<< " for insertelements gather.\n"
<< "SLP: Current total cost = " << Cost << "\n");		<< "SLP: Current total cost = " << Cost << "\n");
}		}

#ifndef NDEBUG		#ifndef NDEBUG
SmallString<256> Str;		SmallString<256> Str;
▲ Show 20 Lines • Show All 1,046 Lines • ▼ Show 20 Lines	Value BoUpSLP::vectorizeTree(TreeEntry E) {
return nullptr;		return nullptr;
}		}

Value *BoUpSLP::vectorizeTree() {		Value *BoUpSLP::vectorizeTree() {
ExtraValueToDebugLocsMap ExternallyUsedValues;		ExtraValueToDebugLocsMap ExternallyUsedValues;
return vectorizeTree(ExternallyUsedValues);		return vectorizeTree(ExternallyUsedValues);
}		}

		namespace {
		/// Data type for handling buildvector sequences with the reused scalars from
		/// other tree entries.
		struct ShuffledInsertData {
		/// List of insertelements to be replaced by shuffles.
		SmallVector<InsertElementInst *> InsertElements;
		/// The parent vectors and shuffle mask for the given list of inserts.
		MapVector<Value *, SmallVector<int>> ValueMasks;
		};
		} // namespace

Value *		Value *
BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {		BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
// All blocks must be scheduled before any instructions are inserted.		// All blocks must be scheduled before any instructions are inserted.
for (auto &BSIter : BlocksSchedules) {		for (auto &BSIter : BlocksSchedules) {
scheduleBlock(BSIter.second.get());		scheduleBlock(BSIter.second.get());
}		}

Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
Show All 17 Lines	if (MinBWs.count(ScalarRoot)) {
auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);		auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);
auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);		auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);
VectorizableTree[0]->VectorizedValue = Trunc;		VectorizableTree[0]->VectorizedValue = Trunc;
}		}

LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()		LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()
<< " values .\n");		<< " values .\n");

		SmallVector<ShuffledInsertData> ShuffledInserts;
// Extract all of the elements with the external uses.		// Extract all of the elements with the external uses.
for (const auto &ExternalUse : ExternalUses) {		for (const auto &ExternalUse : ExternalUses) {
Value *Scalar = ExternalUse.Scalar;		Value *Scalar = ExternalUse.Scalar;
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (!User) {
"Externally used scalar is not found in ExternallyUsedValues");		"Externally used scalar is not found in ExternallyUsedValues");
NewInstLocs.append(It->second);		NewInstLocs.append(It->second);
ExternallyUsedValues.erase(Scalar);		ExternallyUsedValues.erase(Scalar);
// Required to update internally referenced instructions.		// Required to update internally referenced instructions.
Scalar->replaceAllUsesWith(NewInst);		Scalar->replaceAllUsesWith(NewInst);
continue;		continue;
}		}

		if (auto *IE = dyn_cast<InsertElementInst>(User)) {
		if (!Scalar->getType()->isVectorTy()) {
		if (auto *FTy = dyn_cast<FixedVectorType>(User->getType())) {
		Optional<int> InsertIdx = getInsertIndex(IE, 0);
		if (!InsertIdx \|\| *InsertIdx == UndefMaskElem)
		continue;
		auto *It =
		find_if(ShuffledInserts, [IE](const ShuffledInsertData &Data) {
		// Checks if 2 insertelements are from the same buildvector.
		InsertElementInst *VecInsert = Data.InsertElements.front();
		return areTwoInsertFromSameBuildVector(IE, VecInsert);
		});
		int Idx = *InsertIdx;
		if (It == ShuffledInserts.end()) {
		(void)ShuffledInserts.emplace_back();
		It = std::next(ShuffledInserts.begin(), ShuffledInserts.size() - 1);
		}
		SmallVectorImpl<int> &Mask = It->ValueMasks[Vec];
		if (Mask.empty())
		Mask.assign(FTy->getNumElements(), UndefMaskElem);
		assert(Mask[Idx] == UndefMaskElem &&
		"InsertElementInstruction used already.");
		Mask[Idx] = ExternalUse.Lane;
		It->InsertElements.push_back(IE);
		continue;
		}
		}
		}

// Generate extracts for out-of-tree users.		// Generate extracts for out-of-tree users.
// Find the insertion point for the extractelement lane.		// Find the insertion point for the extractelement lane.
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
if (PHINode *PH = dyn_cast<PHINode>(User)) {		if (PHINode *PH = dyn_cast<PHINode>(User)) {
for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {		for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {
if (PH->getIncomingValue(i) == Scalar) {		if (PH->getIncomingValue(i) == Scalar) {
Instruction *IncomingTerminator =		Instruction *IncomingTerminator =
PH->getIncomingBlock(i)->getTerminator();		PH->getIncomingBlock(i)->getTerminator();
		RKSimonUnsubmitted Not Done Reply Inline Actions I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to match IEBase? we break after the if() above so we can't get here from there. RKSimon: I find this control flow very confusing - is the 'cast<InsertElementInst>(Base)' guaranteed to…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We just iterate through insertelements, which are not part of the vectorized buildvector. For example: %0 = insertelement %..., %a, 0 %1 = insertelement %0, %b, 1 %2 = insertelement %1, %c, 2 If %c is vectorized, we start looking through a buildvectror, trying to find the vectorized base. Start from %2. getTreeEntry(%2) returns nullptr. Go to %1. getTreeEntry(%1) returns nullptr too (it is not a part of vectorized buildvector). Go to %0. getTreeEntry(%0) is vectorized and returns E. Iterate through all vectorized insertelements, build a mask. Put %2 to the list of insertelements, which must be transformed to shuffles. Later, we do the analysis of all inserts between %1-%2 (including boundaries), If they must be replaced with shuffles - replace them with shuffles, other insertelements remain as is, just change their base properly to the shuffles. ABataev: We just iterate through insertelements, which are not part of the vectorized buildvector. For…
if (isa<CatchSwitchInst>(IncomingTerminator)) {		if (isa<CatchSwitchInst>(IncomingTerminator)) {
Builder.SetInsertPoint(VecI->getParent(),		Builder.SetInsertPoint(VecI->getParent(),
std::next(VecI->getIterator()));		std::next(VecI->getIterator()));
} else {		} else {
Builder.SetInsertPoint(PH->getIncomingBlock(i)->getTerminator());		Builder.SetInsertPoint(PH->getIncomingBlock(i)->getTerminator());
}		}
Value *NewInst = ExtractAndExtendIfNeeded(Vec);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
CSEBlocks.insert(PH->getIncomingBlock(i));		CSEBlocks.insert(PH->getIncomingBlock(i));
Show All 11 Lines	if (auto *VecI = dyn_cast<Instruction>(Vec)) {
Value *NewInst = ExtractAndExtendIfNeeded(Vec);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
CSEBlocks.insert(&F->getEntryBlock());		CSEBlocks.insert(&F->getEntryBlock());
User->replaceUsesOfWith(Scalar, NewInst);		User->replaceUsesOfWith(Scalar, NewInst);
}		}

LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");
}		}

		auto &&ResizeToVF = [this](Value *Vec, ArrayRef<int> Mask) {
		unsigned VF = Mask.size();
		unsigned VecVF = cast<FixedVectorType>(Vec->getType())->getNumElements();
		if (VF != VecVF) {
		if (any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); })) {
		Vec = Builder.CreateShuffleVector(Vec, Mask);
		return std::make_pair(Vec, true);
		}
		SmallVector<int> ResizeMask(VF, UndefMaskElem);
		for (unsigned I = 0; I < VF; ++I) {
		if (Mask[I] != UndefMaskElem)
		ResizeMask[Mask[I]] = Mask[I];
		}
		Vec = Builder.CreateShuffleVector(Vec, ResizeMask);
		}

		return std::make_pair(Vec, false);
		};
		// Perform shuffling of the vectorize tree entries for better handling of
		// external extracts.
		for (int I = 0, E = ShuffledInserts.size(); I < E; ++I) {
		// Find the first and the last instruction in the list of insertelements.
		sort(ShuffledInserts[I].InsertElements, isFirstInsertElement);
		InsertElementInst *FirstInsert = ShuffledInserts[I].InsertElements.front();
		InsertElementInst *LastInsert = ShuffledInserts[I].InsertElements.back();
		Builder.SetInsertPoint(LastInsert);
		auto Vector = ShuffledInserts[I].ValueMasks.takeVector();
		Value *NewInst = performExtractsShuffleAction<Value>(
		makeMutableArrayRef(Vector.data(), Vector.size()),
		FirstInsert->getOperand(0),
		[](Value *Vec) {
		return cast<VectorType>(Vec->getType())
		->getElementCount()
		.getKnownMinValue();
		},
		ResizeToVF,
		[this](ArrayRef<int> Mask, ArrayRef<Value *> Vals) {
		assert((Vals.size() == 1 \|\| Vals.size() == 2) &&
		"Expected exactly 1 or 2 input values.");
		if (Vals.size() == 1) {
		// Do not create shuffle if the mask is a simple identity
		// non-resizing mask.
		if (Mask.size() != cast<FixedVectorType>(Vals.front()->getType())
		->getNumElements() \|\|
		!ShuffleVectorInst::isIdentityMask(Mask))
		return Builder.CreateShuffleVector(Vals.front(), Mask);
		return Vals.front();
		}
		return Builder.CreateShuffleVector(Vals.front(), Vals.back(), Mask);
		});
		auto It = ShuffledInserts[I].InsertElements.rbegin();
		// Rebuild buildvector chain.
		InsertElementInst *II = nullptr;
		if (It != ShuffledInserts[I].InsertElements.rend())
		II = *It;
		SmallVector<Instruction *> Inserts;
		while (It != ShuffledInserts[I].InsertElements.rend()) {
		assert(II && "Must be an insertelement instruction.");
		if (*It == II)
		++It;
		else
		Inserts.push_back(cast<Instruction>(II));
		II = dyn_cast<InsertElementInst>(II->getOperand(0));
		}
		for (Instruction *II : reverse(Inserts)) {
		II->replaceUsesOfWith(II->getOperand(0), NewInst);
		if (auto *I = dyn_cast<Instruction>(NewInst))
		II->moveAfter(I);
		NewInst = II;
		}
		for (InsertElementInst *IE : reverse(ShuffledInserts[I].InsertElements)) {
		IE->replaceUsesOfWith(IE->getOperand(1),
		PoisonValue::get(IE->getOperand(1)->getType()));
		eraseInstruction(IE);
		}
		LastInsert->replaceAllUsesWith(NewInst);
		CSEBlocks.insert(LastInsert->getParent());
		}

// For each vectorized value:		// For each vectorized value:
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (Op1 && L->contains(Op1))
continue;		continue;

// We can hoist this instruction. Move it to the pre-header.		// We can hoist this instruction. Move it to the pre-header.
I->moveBefore(PreHeader->getTerminator());		I->moveBefore(PreHeader->getTerminator());
}		}

// Make a list of all reachable blocks in our CSE queue.		// Make a list of all reachable blocks in our CSE queue.
SmallVector<const DomTreeNode *, 8> CSEWorkList;		SmallVector<const DomTreeNode *, 8> CSEWorkList;
CSEWorkList.reserve(CSEBlocks.size());		CSEWorkList.reserve(CSEBlocks.size());
		fhahnUnsubmitted Not Done Reply Inline Actions Is it possible that the relative order of elements that compare as equal matters in the code below? With stable_sort, I am not seeing the crash. fhahn: Is it possible that the relative order of elements that compare as equal matters in the code…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there should not be difference between sort and stable sort results. ABataev: Let me check, yeah, most probably caused by the libc++ diff. I used sort here as I hoped there…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58? ABataev: Could you check again after f9c806ae5c53c990a935c46ba351cdcfb1271c58?
		fhahnUnsubmitted Not Done Reply Inline Actions It doesn't crash any longer, thanks! fhahn: It doesn't crash any longer, thanks!
		ABataevAuthorUnsubmitted Done Reply Inline Actions Great! ABataev: Great!
for (BasicBlock *BB : CSEBlocks)		for (BasicBlock *BB : CSEBlocks)
if (DomTreeNode *N = DT->getNode(BB)) {		if (DomTreeNode *N = DT->getNode(BB)) {
assert(DT->isReachableFromEntry(N));		assert(DT->isReachableFromEntry(N));
CSEWorkList.push_back(N);		CSEWorkList.push_back(N);
}		}

// Sort blocks by domination. This ensures we visit a block after all blocks		// Sort blocks by domination. This ensures we visit a block after all blocks
// dominating it are visited.		// dominating it are visited.
llvm::sort(CSEWorkList, [](const DomTreeNode A, const DomTreeNode B) {		llvm::sort(CSEWorkList, [](const DomTreeNode A, const DomTreeNode B) {
assert((A == B) == (A->getDFSNumIn() == B->getDFSNumIn()) &&		assert((A == B) == (A->getDFSNumIn() == B->getDFSNumIn()) &&
"Different nodes should have different DFS numbers");		"Different nodes should have different DFS numbers");
return A->getDFSNumIn() < B->getDFSNumIn();		return A->getDFSNumIn() < B->getDFSNumIn();
});		});

// Perform O(N^2) search over the gather sequences and merge identical		// Perform O(N^2) search over the gather sequences and merge identical
// instructions. TODO: We can further optimize this scan if we split the		// instructions. TODO: We can further optimize this scan if we split the
// instructions into different buckets based on the insert lane.		// instructions into different buckets based on the insert lane.
SmallVector<Instruction *, 16> Visited;		SmallVector<Instruction *, 16> Visited;
		// Less defined shuffles can be replaced by the more defined copies.
		auto &&IsIdenticalOrLessDefined = [](Instruction I1, Instruction I2) {
		if (auto *SI1 = dyn_cast<ShuffleVectorInst>(I1))
		if (auto *SI2 = dyn_cast<ShuffleVectorInst>(I2)) {
		if (SI1->isIdenticalTo(SI2))
		return true;
		if (SI1->getType() != SI2->getType())
		return false;
		for (int I = 0, E = SI1->getNumOperands(); I < E; ++I)
		if (SI1->getOperand(I) != SI2->getOperand(I))
		return false;
		// Check if the second instruction is more defined than the first one.
		ArrayRef<int> SM1 = SI1->getShuffleMask();
		ArrayRef<int> SM2 = SI2->getShuffleMask();
		for (int I = 0, E = SM1.size(); I < E; ++I)
		if (SM1[I] != UndefMaskElem && SM1[I] != SM2[I])
		return false;
		return true;
		}
		return I1->isIdenticalTo(I2);
		};
for (auto I = CSEWorkList.begin(), E = CSEWorkList.end(); I != E; ++I) {		for (auto I = CSEWorkList.begin(), E = CSEWorkList.end(); I != E; ++I) {
assert(*I &&		assert(*I &&
(I == CSEWorkList.begin() \|\| !DT->dominates(I, std::prev(I))) &&		(I == CSEWorkList.begin() \|\| !DT->dominates(I, std::prev(I))) &&
"Worklist not sorted properly!");		"Worklist not sorted properly!");
BasicBlock BB = (I)->getBlock();		BasicBlock BB = (I)->getBlock();
// For all instructions in blocks containing gather sequences:		// For all instructions in blocks containing gather sequences:
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e;) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e;) {
Instruction In = &it++;		Instruction In = &it++;
if (isDeleted(In))		if (isDeleted(In))
continue;		continue;
if (!isa<InsertElementInst>(In) && !isa<ExtractElementInst>(In) &&		if (!isa<InsertElementInst>(In) && !isa<ExtractElementInst>(In) &&
!isa<ShuffleVectorInst>(In))		!isa<ShuffleVectorInst>(In))
continue;		continue;

// Check if we can replace this instruction with any of the		// Check if we can replace this instruction with any of the
// visited instructions.		// visited instructions.
for (Instruction *v : Visited) {		for (Instruction *&v : Visited) {
if (In->isIdenticalTo(v) &&		if (IsIdenticalOrLessDefined(In, v) &&
DT->dominates(v->getParent(), In->getParent())) {		DT->dominates(v->getParent(), In->getParent())) {
In->replaceAllUsesWith(v);		In->replaceAllUsesWith(v);
eraseInstruction(In);		eraseInstruction(In);
In = nullptr;		In = nullptr;
break;		break;
}		}
		if (isa<ShuffleVectorInst>(In) && isa<ShuffleVectorInst>(v) &&
		IsIdenticalOrLessDefined(v, In) &&
		DT->dominates(In->getParent(), v->getParent())) {
		In->moveAfter(v);
		v->replaceAllUsesWith(In);
		eraseInstruction(v);
		v = In;
		In = nullptr;
		break;
		}
}		}
if (In) {		if (In) {
assert(!is_contained(Visited, In));		assert(!is_contained(Visited, In));
Visited.push_back(In);		Visited.push_back(In);
}		}
}		}
}		}
CSEBlocks.clear();		CSEBlocks.clear();
▲ Show 20 Lines • Show All 2,868 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer			; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer
	; AVX-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]			; AVX-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]
	; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
	; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
	; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]			; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0			; AVX-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> <i32 1, i32 undef>
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1			; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1
	; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]			; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX: for.end:			; AVX: for.end:
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 28 Lines
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> <double poison, double undef>, <2 x double> [[TMP6]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	}			}

	define fastcc void @dct36(double* %inbuf) {			define fastcc void @dct36(double* %inbuf) {
	; CHECK-LABEL: @dct36(			; CHECK-LABEL: @dct36(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> <double poison, double undef>, <2 x double> [[TMP1]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2			%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
	%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1			%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
	%0 = load double, double* %arrayidx44, align 8			%0 = load double, double* %arrayidx44, align 8
	%add46 = fadd double %0, undef			%add46 = fadd double %0, undef
	store double %add46, double* %arrayidx41, align 8			store double %add46, double* %arrayidx41, align 8
	%1 = load double, double* %inbuf, align 8			%1 = load double, double* %inbuf, align 8
	%add49 = fadd double %1, %0			%add49 = fadd double %1, %0
	store double %add49, double* %arrayidx44, align 8			store double %add49, double* %arrayidx44, align 8
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }			%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1			; CHECK-NEXT: [[TMP0:%.*]] = fmul <2 x double> undef, undef
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x double> undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> undef, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0			; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.then78, label %if.then38			br i1 undef, label %if.then78, label %if.then38
	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0			; CHECK-NEXT: [[X0:%.]] = extractelement <2 x float> [[X:%.]], i32 0
	; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1			; CHECK-NEXT: [[X1:%.*]] = extractelement <2 x float> [[X]], i32 1
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[TMP1]]
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = shufflevector <2 x float> [[X:%.]], <2 x float> poison, <2 x i32> <i32 1, i32 1>
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[X]], [[TMP1]]
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 1,023 Lines • ▼ Show 20 Lines
	; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
	; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
	; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0			; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1			; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
	; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0			; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0
	; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP16:%.*]] = shufflevector <2 x i32> [[TMP15]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 2>
	; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0			; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
	; THRESH-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[TMP18:%.*]] = shufflevector <2 x i32> [[TMP17]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]			; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]
	; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0			; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
	; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1			; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]			; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP20]], i32 [[TMP21]]			; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP20]], i32 [[TMP21]]
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: ret i32 [[OP_EXTRA1]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @reschedule_extract(			; NOTHRESHOLD-LABEL: @reschedule_extract(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @reschedule_extract(			; MINTREESIZE-LABEL: @reschedule_extract(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> poison, float %c0, i32 0			%v0 = insertelement <4 x float> poison, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	Show All 16 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @take_credit(			; NOTHRESHOLD-LABEL: @take_credit(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @take_credit(			; MINTREESIZE-LABEL: @take_credit(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	Show All 40 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @_vadd256(			; NOTHRESHOLD-LABEL: @_vadd256(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @_vadd256(			; MINTREESIZE-LABEL: @_vadd256(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
	; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6			; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
	; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
	; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
	; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
	; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
	; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
	; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
	; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
	; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
	; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
	; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
	; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
	;			;
	%vecext = extractelement <8 x float> %a, i32 0			%vecext = extractelement <8 x float> %a, i32 0
	%vecext1 = extractelement <8 x float> %b, i32 0			%vecext1 = extractelement <8 x float> %b, i32 0
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecext2 = extractelement <8 x float> %a, i32 1			%vecext2 = extractelement <8 x float> %a, i32 1
	%vecext3 = extractelement <8 x float> %b, i32 1			%vecext3 = extractelement <8 x float> %b, i32 1
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	%vecext5 = extractelement <8 x float> %a, i32 2			%vecext5 = extractelement <8 x float> %a, i32 2
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 464 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @reschedule_extract(			; NOTHRESHOLD-LABEL: @reschedule_extract(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @reschedule_extract(			; MINTREESIZE-LABEL: @reschedule_extract(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP2]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> undef, float %c0, i32 0			%v0 = insertelement <4 x float> undef, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	Show All 16 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @take_credit(			; NOTHRESHOLD-LABEL: @take_credit(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <4 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @take_credit(			; MINTREESIZE-LABEL: @take_credit(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <4 x float> [[B:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 1, i32 5>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 2, i32 6>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[A]], <4 x float> [[B]], <2 x i32> <i32 3, i32 7>
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: ret <4 x float> [[TMP5]]
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	%a2 = extractelement <4 x float> %a, i32 2			%a2 = extractelement <4 x float> %a, i32 2
	Show All 40 Lines
	; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; THRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; NOTHRESHOLD-LABEL: @_vadd256(			; NOTHRESHOLD-LABEL: @_vadd256(
	; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]			; NOTHRESHOLD-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
	; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]			; NOTHRESHOLD-NEXT: ret <8 x float> [[TMP1]]
	;			;
	; MINTREESIZE-LABEL: @_vadd256(			; MINTREESIZE-LABEL: @_vadd256(
	; MINTREESIZE-NEXT: [[TMP1:%.]] = extractelement <8 x float> [[B:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <2 x i32> <i32 0, i32 8>
	; MINTREESIZE-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[B]], i32 6			; MINTREESIZE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 1, i32 9>
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[B]], i32 5			; MINTREESIZE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 2, i32 10>
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[B]], i32 4			; MINTREESIZE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 3, i32 11>
	; MINTREESIZE-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[B]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 4, i32 12>
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[B]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 5, i32 13>
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 6, i32 14>
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <2 x i32> <i32 7, i32 15>
	; MINTREESIZE-NEXT: [[TMP9:%.]] = extractelement <8 x float> [[A:%.]], i32 7			; MINTREESIZE-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = extractelement <8 x float> [[A]], i32 6			; MINTREESIZE-NEXT: ret <8 x float> [[TMP9]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = extractelement <8 x float> [[A]], i32 5
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = extractelement <8 x float> [[A]], i32 4
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = extractelement <8 x float> [[A]], i32 3
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = extractelement <8 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = extractelement <8 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> poison, float [[TMP16]], i32 0
	; MINTREESIZE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> [[TMP17]], float [[TMP8]], i32 1
	; MINTREESIZE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> poison, float [[TMP15]], i32 0
	; MINTREESIZE-NEXT: [[TMP20:%.*]] = insertelement <2 x float> [[TMP19]], float [[TMP7]], i32 1
	; MINTREESIZE-NEXT: [[TMP21:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i32 0
	; MINTREESIZE-NEXT: [[TMP22:%.*]] = insertelement <2 x float> [[TMP21]], float [[TMP6]], i32 1
	; MINTREESIZE-NEXT: [[TMP23:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; MINTREESIZE-NEXT: [[TMP24:%.*]] = insertelement <2 x float> [[TMP23]], float [[TMP5]], i32 1
	; MINTREESIZE-NEXT: [[TMP25:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
	; MINTREESIZE-NEXT: [[TMP26:%.*]] = insertelement <2 x float> [[TMP25]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP27:%.*]] = insertelement <2 x float> poison, float [[TMP11]], i32 0
	; MINTREESIZE-NEXT: [[TMP28:%.*]] = insertelement <2 x float> [[TMP27]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP29:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
	; MINTREESIZE-NEXT: [[TMP30:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP31:%.*]] = insertelement <2 x float> poison, float [[TMP9]], i32 0
	; MINTREESIZE-NEXT: [[TMP32:%.*]] = insertelement <2 x float> [[TMP31]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP33:%.*]] = fadd <8 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <8 x float> [[TMP33]]
	;			;
	%vecext = extractelement <8 x float> %a, i32 0			%vecext = extractelement <8 x float> %a, i32 0
	%vecext1 = extractelement <8 x float> %b, i32 0			%vecext1 = extractelement <8 x float> %b, i32 0
	%add = fadd float %vecext, %vecext1			%add = fadd float %vecext, %vecext1
	%vecext2 = extractelement <8 x float> %a, i32 1			%vecext2 = extractelement <8 x float> %a, i32 1
	%vecext3 = extractelement <8 x float> %b, i32 1			%vecext3 = extractelement <8 x float> %b, i32 1
	%add4 = fadd float %vecext2, %vecext3			%add4 = fadd float %vecext2, %vecext3
	%vecext5 = extractelement <8 x float> %a, i32 2			%vecext5 = extractelement <8 x float> %a, i32 2
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	Show All 16 Lines
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP7]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP7]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP3]], i32 2			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> poison, [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> poison, [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP11]], poison			; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP11]], poison
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP12]], poison			; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP12]], poison
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <4 x float> [[TMP13]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <4 x float> [[TMP13]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP13]], i32 1			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP14]], 0
	; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP15]], i32 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP15]], 1
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP13]], i32 2
	; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP16]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP13]], i32 3
	; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP17]], i32 1
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	;			;
	entry:			entry:
	%0 = load float, float* undef, align 4			%0 = load float, float* undef, align 4
	%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0			%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0
	%1 = load float, float* %x, align 16			%1 = load float, float* %x, align 16
	%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1			%y = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 1
	%2 = load float, float* %y, align 4			%2 = load float, float* %y, align 4
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {			define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load-multiuses(			; CHECK-LABEL: @jumbled-load-multiuses(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP2]], [[TMP10]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3
	%load.2 = load i32, i32* %gep.1, align 4			%load.2 = load i32, i32* %gep.1, align 4
	%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 1
	%load.3 = load i32, i32* %gep.2, align 4			%load.3 = load i32, i32* %gep.2, align 4
	Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

	Show All 37 Lines
	; CHECK-NEXT: store float [[TMP13]], float* @e, align 4			; CHECK-NEXT: store float [[TMP13]], float* @e, align 4
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1
	; CHECK-NEXT: store float [[TMP14]], float* @f, align 4			; CHECK-NEXT: store float [[TMP14]], float* @f, align 4
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float			; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP16]], <4 x float> [[SHUFFLE]], <4 x i32> <i32 0, i32 1, i32 4, i32 3>
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP17]], i32 2			; CHECK-NEXT: [[TMP18:%.*]] = fsub <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP19:%.*]] = fsub <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP10]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = fadd <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <4 x float> [[TMP18]], <4 x float> [[TMP19]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP21:%.*]] = fptosi <4 x float> [[TMP20]] to <4 x i32>
	; CHECK-NEXT: [[TMP22:%.*]] = fptosi <4 x float> [[TMP21]] to <4 x i32>			; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*
	; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*			; CHECK-NEXT: store <4 x i32> [[TMP21]], <4 x i32>* [[TMP22]], align 4
	; CHECK-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP23]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* @b, align 8			%0 = load i32, i32* @b, align 8
	%arrayidx = getelementptr inbounds i32, i32* %0, i64 4			%arrayidx = getelementptr inbounds i32, i32* %0, i64 4
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12			%arrayidx1 = getelementptr inbounds i32, i32* %0, i64 12
	%2 = load i32, i32* %arrayidx1, align 4			%2 = load i32, i32* %arrayidx1, align 4
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show First 20 Lines • Show All 596 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0			; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
	; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1			; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
	; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP6]], double [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP8]], [[TMP2]]
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = fadd <2 x double> [[TMP9]], [[TMP6]]
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP10]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP2]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], [[TMP9]]
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1			; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%loadA0 = load double, double* %idx0, align 4			%loadA0 = load double, double* %idx0, align 4
	%loadA1 = load double, double* %idx1, align 4			%loadA1 = load double, double* %idx1, align 4

	%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4			%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

	Show All 23 Lines
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP5]], [[WHILE_BODY_LR_PH]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP5]], [[WHILE_BODY_LR_PH]] ]
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
	; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0			; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0
	; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8			; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP9]], <2 x i64> [[TMP6]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[TMP9]], i64 [[TMP10]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = and <2 x i64> [[TMP10]], [[TMP7]]
	; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i64> [[TMP11]], [[TMP7]]			; CHECK-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8			%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8
	%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8			%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8
	br i1 %x, label %while.body.lr.ph, label %while.end			br i1 %x, label %while.body.lr.ph, label %while.end
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP16:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> [[TMP12]], <4 x i32> <i32 1, i32 undef, i32 2, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[TMP14]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP16]] = fadd <4 x float> [[TMP6]], [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP17]], 121
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP16]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP18]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x float> [[TMP16]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP19]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP16]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP20]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP16]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP21]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	%2 = load float, float* %arrayidx2, align 4			%2 = load float, float* %arrayidx2, align 4
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; Make sure the order of phi nodes of different types does not prevent			; Make sure the order of phi nodes of different types does not prevent
	; vectorization of same typed phi nodes.			; vectorization of same typed phi nodes.
	define float @sort_phi_type(float* nocapture readonly %A) {			define float @sort_phi_type(float* nocapture readonly %A) {
	; CHECK-LABEL: @sort_phi_type(			; CHECK-LABEL: @sort_phi_type(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP2:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2]] = fmul <4 x float> [[TMP1]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP9]] = fmul <4 x float> [[TMP8]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP12]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP5]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP13]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP6]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]			%Y = phi float [ 1.000000e+01, %entry ], [ %mul10, %for.body ]
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP4]], 0
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context			; See https://reviews.llvm.org/D70068 and https://reviews.llvm.org/D70587 for context

	; Checks that vector insertvalues into the struct become SLP seeds.			; Checks that vector insertvalues into the struct become SLP seeds.
	define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {			define { <2 x float>, <2 x float> } @StructOfVectors(float *%Ptr) {
	; CHECK-LABEL: @StructOfVectors(			; CHECK-LABEL: @StructOfVectors(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[PTR]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[VECIN0:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP4]], 0
	; CHECK-NEXT: [[VECIN1:%.*]] = insertelement <2 x float> [[VECIN0]], float [[TMP5]], i64 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[VECIN2:%.*]] = insertelement <2 x float> undef, float [[TMP6]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[VECIN3:%.*]] = insertelement <2 x float> [[VECIN2]], float [[TMP7]], i64 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VECIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { <2 x float>, <2 x float> } [[RET0]], <2 x float> [[VECIN3]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 25 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue [2 x %StructTy] undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue [2 x %StructTy] [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]			; CHECK-NEXT: ret [2 x %StructTy] [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 23 Lines
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[PTR]], i64 2
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd fast <4 x float> [[TMP2]], <float 1.100000e+01, float 1.200000e+01, float 1.300000e+01, float 1.400000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCTTY]] undef, float [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] %StructIn2, float [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN2]], float [[TMP7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] %StructIn3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], [[STRUCTTY]] } [[RET0]], [[STRUCTTY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCTTY]], [[STRUCTTY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2			%GEP2 = getelementptr inbounds float, float* %Ptr, i64 2
	%L2 = load float, float * %GEP2			%L2 = load float, float * %GEP2
	Show All 26 Lines
	; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[L2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[PTR]], i64 3
	; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4			; CHECK-NEXT: [[L3:%.]] = load float, float [[GEP3]], align 4
	; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01			; CHECK-NEXT: [[FADD0:%.*]] = fadd fast float [[L0]], 1.100000e+01
	; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01			; CHECK-NEXT: [[FADD1:%.*]] = fadd fast float [[L1]], 1.200000e+01
	; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01			; CHECK-NEXT: [[FADD2:%.*]] = fadd fast float [[L2]], 1.300000e+01
	; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01			; CHECK-NEXT: [[FADD3:%.*]] = fadd fast float [[L3]], 1.400000e+01
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCTTY:%.]] undef, float [[FADD0]], 0
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] %StructIn0, float [[FADD1]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCTTY]] [[STRUCTIN0]], float [[FADD1]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] %StructIn1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCTTY]], float, float } undef, [[STRUCTTY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET0]], float [[FADD2]], 1
	; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2			; CHECK-NEXT: [[RET2:%.*]] = insertvalue { [[STRUCTTY]], float, float } [[RET1]], float [[FADD3]], 2
	; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]			; CHECK-NEXT: ret { [[STRUCTTY]], float, float } [[RET2]]
	;			;
	%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0			%GEP0 = getelementptr inbounds float, float* %Ptr, i64 0
	%L0 = load float, float * %GEP0			%L0 = load float, float * %GEP0
	%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1			%GEP1 = getelementptr inbounds float, float* %Ptr, i64 1
	%L1 = load float, float * %GEP1			%L1 = load float, float * %GEP1
	Show All 30 Lines
	; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6			; CHECK-NEXT: [[GEP6:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 6
	; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7			; CHECK-NEXT: [[GEP7:%.]] = getelementptr inbounds i16, i16 [[PTR]], i64 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[GEP0]] to <8 x i16>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP2]], <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0
	; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0			; CHECK-NEXT: [[STRUCTIN0:%.]] = insertvalue [[STRUCT1TY:%.]] undef, i16 [[TMP4]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] %StructIn0, i16 [[TMP5]], 1			; CHECK-NEXT: [[STRUCTIN1:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN0]], i16 [[TMP5]], 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0			; CHECK-NEXT: [[STRUCTIN2:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP6]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] %StructIn2, i16 [[TMP7]], 1			; CHECK-NEXT: [[STRUCTIN3:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN2]], i16 [[TMP7]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0			; CHECK-NEXT: [[STRUCTIN4:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP8]], 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] %StructIn4, i16 [[TMP9]], 1			; CHECK-NEXT: [[STRUCTIN5:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN4]], i16 [[TMP9]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0			; CHECK-NEXT: [[STRUCTIN6:%.*]] = insertvalue [[STRUCT1TY]] undef, i16 [[TMP10]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] %StructIn6, i16 [[TMP11]], 1			; CHECK-NEXT: [[STRUCTIN7:%.*]] = insertvalue [[STRUCT1TY]] [[STRUCTIN6]], i16 [[TMP11]], 1
	; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] %StructIn1, 0			; CHECK-NEXT: [[STRUCT2IN0:%.]] = insertvalue [[STRUCT2TY:%.]] undef, [[STRUCT1TY]] [[STRUCTIN1]], 0
	; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In0, [[STRUCT1TY]] %StructIn3, 1			; CHECK-NEXT: [[STRUCT2IN1:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN0]], [[STRUCT1TY]] [[STRUCTIN3]], 1
	; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] %StructIn5, 0			; CHECK-NEXT: [[STRUCT2IN2:%.*]] = insertvalue [[STRUCT2TY]] undef, [[STRUCT1TY]] [[STRUCTIN5]], 0
	; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] %Struct2In2, [[STRUCT1TY]] %StructIn7, 1			; CHECK-NEXT: [[STRUCT2IN3:%.*]] = insertvalue [[STRUCT2TY]] [[STRUCT2IN2]], [[STRUCT1TY]] [[STRUCTIN7]], 1
	; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] %Struct2In1, 0			; CHECK-NEXT: [[RET0:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } undef, [[STRUCT2TY]] [[STRUCT2IN1]], 0
	; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] %Struct2In3, 1			; CHECK-NEXT: [[RET1:%.*]] = insertvalue { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET0]], [[STRUCT2TY]] [[STRUCT2IN3]], 1
				RKSimonUnsubmitted Not Done Reply Inline Actions These look like NFC changes by the update script that can probably be pre-comitted to reduce the patch? RKSimon: These look like NFC changes by the update script that can probably be pre-comitted to reduce…
	; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]			; CHECK-NEXT: ret { [[STRUCT2TY]], [[STRUCT2TY]] } [[RET1]]
	;			;
	%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0			%GEP0 = getelementptr inbounds i16, i16* %Ptr, i64 0
	%L0 = load i16, i16 * %GEP0			%L0 = load i16, i16 * %GEP0
	%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1			%GEP1 = getelementptr inbounds i16, i16* %Ptr, i64 1
	%L1 = load i16, i16 * %GEP1			%L1 = load i16, i16 * %GEP1
	%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2			%GEP2 = getelementptr inbounds i16, i16* %Ptr, i64 2
	%L2 = load i16, i16 * %GEP2			%L2 = load i16, i16 * %GEP2
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show All 40 Lines
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP7]], i32 1
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show All 40 Lines
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP7]], i32 1
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 373619

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr42022.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
ClosedPublic