This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
2/2
SLPVectorizer.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
151/151
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
accelerate-vector-functions-inseltpoison.ll
4/4
accelerate-vector-functions.ll
-
gather-root.ll
-
insertelement-inseltpoison.ll
-
insertelement.ll
-
transpose-inseltpoison.ll
-
transpose.ll
-
vectorize-free-extracts-inserts.ll
-
AMDGPU/
-
add_sub_sat-inseltpoison.ll
-
add_sub_sat.ll
1/1
bswap-inseltpoison.ll
1/1
bswap.ll
-
crash_extract_subvector_cost.ll
1/1
round-inseltpoison.ll
1/1
round.ll
-
ARM/
-
extract-insert-inseltpoison.ll
-
extract-insert.ll
-
X86/
-
PR35865-inseltpoison.ll
-
PR35865.ll
-
PR39774.ll
-
alternate-cast-inseltpoison.ll
-
alternate-cast.ll
-
alternate-fp-inseltpoison.ll
-
alternate-fp.ll
2/2
alternate-int-inseltpoison.ll
-
alternate-int.ll
-
arith-fp-inseltpoison.ll
-
arith-fp.ll
-
blending-shuffle-inseltpoison.ll
-
blending-shuffle.ll
-
external_user_jumbled_load-inseltpoison.ll
-
external_user_jumbled_load.ll
-
fptosi-inseltpoison.ll
-
fptosi.ll
-
hadd-inseltpoison.ll
1/1
hadd.ll
1/1
hsub-inseltpoison.ll
-
hsub.ll
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
long_chains.ll
-
operandorder.ll
-
phi.ll
-
pr31599-inseltpoison.ll
-
pr31599.ll
-
pr40522.ll
-
pr44067-inseltpoison.ll
-
pr44067.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
resched.ll
2/2
sext-inseltpoison.ll
2/2
sext.ll
-
sign-extend-inseltpoison.ll
-
sign-extend.ll
-
sitofp-inseltpoison.ll
-
sitofp.ll
-
value-bug-inseltpoison.ll
-
value-bug.ll
-
zext-inseltpoison.ll
-
zext.ll
-
vectorizable-functions-inseltpoison.ll
-
vectorizable-functions.ll

Differential D98714

[SLP] Add insertelement instructions to vectorizable tree
ClosedPublic

Authored by anton-afanasyev on Mar 16 2021, 8:21 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
spatel
dtemirbulatov

Commits

rGab2c499d3a2e: [SLP] Add insertelement instructions to vectorizable tree

Summary

Add new type of tree node for InsertElementInst chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of CompensateCost tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge store <2 x float>
and store <2 x float> to store <4 x float>).

Fixes PR40522 and PR35732.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

anton-afanasyev added inline comments.Apr 16 2021, 3:45 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3597	Thanks, done
3807	Thanks, done
4142	If any of elements is `InsertElementInst`, they are all the same, so checking one is enough. And we can't move it closer to the end, since the next check can return true.
4363	Thanks, done
4364	Thanks, done
4386–4387	It's definitely not an Identity and not a Reverse here.
4785	Thanks, done
4846–4888	Do you mean to come back to extracts generating here? I don't believe it's a good way by several reasons: It doesn't match the common logic of vectorization (changing scalar operations to one vector operation), If we leave insertelements unvectorized (i.e. replaced by shuffles), these inserts may be somehow processed again, If we can deal with it in SLPVectorizer, why not to do it early? It's not too hard to do.
5399–5400	Thanks, done
6712–6713	Yes, I've planned to do this, but can't: we make a decision about vectorization based on the _operands_: InstructionsState S = getSameOpcode(VL); if (!S.getOpcode()) return false; ... unsigned Sz = R.getVectorElementSize(I0); but then make vectorization starting from inserts.

Address some comments

Harbormaster completed remote builds in B99133: Diff 338054.Apr 16 2021, 3:46 AM

@anton-afanasyev Reverse ping

anton-afanasyev marked 9 inline comments as done.May 5 2021, 12:45 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2887–2892	Ok, moved to `GetConstantOperand()`
2893–2900	Ok, added `GetConstantOperand()`
3816	Thanks, done this way.
3818–3824	I've removed this commented code after `getScalarizationOverhead()` used as @RKSimon suggested.
4142	Moreover, have to move it upper, to `isTreeTinyAndNotFullyVectorizable()` function, since need to exclude explicitly cases when vectorizing inserts of gathered values. It makes no sense and otherwise we can fall into infinite loop of generating inserts and vectorizing them again (for instance, when `-slp-min-tree-size=2` is set).
6712–6713	Ok, eventually found a way to drop `InsertUses` parameter.
llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll
225	I've debugged this case. Without acceleration `@llvm.exp.v4f32` is too expensive (call cost is 18=58-40 vs -30 = 10-40 with accelaration), whereas `@llvm.exp.v2f32` is cheaper (call cost is 6=26-20).
308	This case is the same as above.
llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
186	Thanks, fixed
llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll
4	Ok, fixed
llvm/test/Transforms/SLPVectorizer/X86/sext.ll
4	Ok, fixed

Addressed all comments

Harbormaster completed remote builds in B102820: Diff 343148.May 5 2021, 12:46 PM

ABataev added inline comments.May 5 2021, 2:12 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2893–2894	for (Value *V : VL) MinIndex = std::min<int>(MinIndex, GetConstantOperand(V));
3811–3812	Turn `GetConstantOperand` into a function and use it
4363	`ArrayRef<Value *>`
4368	`Value *`
4369–4372	`std::min<int>` and use a function
4377–4378	Use a function to extract index
4383	`Scalars.contains(Insert)` or `.count(Insert)`

anton-afanasyev marked 7 inline comments as done.May 5 2021, 10:46 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2893–2894	Ok, done
3811–3812	Ok, done
4363	Ok, done
4368	Ok, done
4369–4372	Ok, done
4377–4378	Ok, done
4383	Ok, done

Addressed comments

Harbormaster completed remote builds in B102915: Diff 343288.May 5 2021, 10:47 PM

Reused getOperandIndex() for InsertElement instructions

Harbormaster completed remote builds in B102944: Diff 343338.May 6 2021, 3:05 AM

Fix names, minors

Harbormaster completed remote builds in B102946: Diff 343342.May 6 2021, 3:25 AM

ABataev added inline comments.May 6 2021, 5:06 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	SmallVector<int, 16> Mask(NumElts, UndefMaskElem); std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0); Also, I think just `std::iota(Mask.begin(), Mask.end(), 0);` shall work too
4871–4879	Can we use `ShuffleBuilder` here?
4871–4879	Did you include these costs in the cost model?
5385	`Scalars.contain(Insert)`

anton-afanasyev marked 4 inline comments as done.May 6 2021, 9:14 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	Thanks, changed to `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` Yes, `std::iota(Mask.begin(), Mask.end(), 0);` gives the same result, but wouldn't it lead to redundant code lowered?
4871–4879	They are included in `getScalarizationOverhead()`.
4871–4879	Yes, I thought about `ShuffleBuilder`, but here we just need to create two shuffles of special kind for vector resizing. It requires `ShuffleInstructionBuilder` expanding, don't think it's worth it.
5385	Thanks, done

Addressed new comments

Harbormaster completed remote builds in B103015: Diff 343434.May 6 2021, 9:15 AM

ABataev added inline comments.May 6 2021, 9:25 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	`std::iota(Mask.begin(), Mask.end(), 0);` will produce the same code as `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` but only if `NumScalars <= Mask.size() * 2`. Otherwise the compiler may crash in some cases. So, better to keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);`
4871–4879	I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors

anton-afanasyev marked an inline comment as done.May 7 2021, 12:09 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	`Mask.size()` is just `NumElts` and `NumScalars <= NumElts` (is it worth to add assert for this?), so `NumScalars <= Mask.size() * 2` for all cases. What I did mean by "the same result" is that shuffle %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> is equivalent to %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> (here `NumScalars == 2` and `NumElts == 4`).

ABataev added inline comments.May 7 2021, 3:02 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	We may have a situation say NumScalars is 2 and NumElts is 8, if insert the same scalars several times.

anton-afanasyev marked 3 inline comments as done.May 7 2021, 3:20 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	Do you mean "NumScalars is 8 and NumElts is 2"? This case is excluded, we can get only unique inserts (inserts with unique index) from `findBuildAggregate()` for now. Also, we even assert assert(ReuseShuffleIndicies.empty() && "All inserts should be unique"); at first line of `InsertElement` at the `buildTree_rec()` function.

anton-afanasyev marked an inline comment as done.May 7 2021, 3:31 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4871–4879	Did you include these costs in the cost model? They are included in getScalarizationOverhead(). I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors These subvector inserts are lowered to nop actually, so they cost nothing. We need this code when processing big vector of scalars part-by-part, but every chunk fits the whole vector register (condition `bits >= MinVecRegSize`), so actually there is no inserting. The result consisting of several vector registers is returned then.

ABataev added inline comments.May 7 2021, 3:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	Well, maybe :) Anyway, let's keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` just to avoid fixin something in future when we add support for vectorization of more code patterns

ABataev added inline comments.May 7 2021, 3:36 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4871–4879	This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1

anton-afanasyev marked 2 inline comments as done.May 7 2021, 3:52 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4869–4870	Sure!
4871–4879	This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1 Both of these shuffles are just the one operation of "inserting", we need the first one to expand source vector since shufflevector needs the same size of operands. We can add cost of this shuffles, but this prevents several vectorization being performed before, since this cost is redundant (may be `TTI->getShuffleCost()` should be tuned for `TargetTransformInfo::SK_InsertVector`?)

ABataev added inline comments.May 7 2021, 4:09 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4871–4879	Need to check that the vectorized code is really profitable and if so, need to tune the cost model, yes.

RKSimon added inline comments.May 7 2021, 4:31 AM

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
96–97	Remove the references to InsertUses in the doxygen, otherwise we'll get Wdocumentation warnings

RKSimon mentioned this in rG2a3f60b5f530: [SLP] Regenerate tests to reduce diff in D98714. NFCI..May 7 2021, 4:33 AM

Fixed comment

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h
96–97	Thanks, done

Harbormaster completed remote builds in B103179: Diff 343648.May 7 2021, 5:06 AM

ABataev added inline comments.May 7 2021, 6:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2887–2897	I think this can be simplified: int Offset = getInsertIndex(VL[0], 0); ValueList Operands(VL.size()); Operands[0] = cast<Instruction>(VL[0])->getOperand(1); bool IsConsecutive = true; for (unsigned I = 1, E = VL.size(); I < E; ++I) { IsConsecutive &= (I == getInsertIndex(VL[I], 0) - Offset); Operands[I] = cast<Instruction>(VL[I])->getOperand(1); }
2899	If not consecutive, we can include the cost of single permute and still vectorize it, no?
3810–3811	Hm, I think you need to normalize insert indices here, the `MinIndex` (offset) might be non `0` but should be 0. Plus, I think, need to add (subtract from vectcost) a cost of subvector insertion from Index `MinIndex` (if `MinIndex != 0`)
4380–4384	I think this should be: ExtractCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VL0->getType(), MinIndex, VecTy);
5369	What if the user is Phi in thу middle of other phis?
5375–5377	You're using this code in many places, make it a function.
5387–5388	Why we can't use just `V` here?

anton-afanasyev marked 6 inline comments as done.May 10 2021, 5:21 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2887–2897	Yes, sure, after refactoring we expect consecutive operands here and this can be simplified. Did this way.

anton-afanasyev added inline comments.May 10 2021, 5:21 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Inserts here are coming from `findBuildAggregate()`, being already sorted by index and therefore consecutive, the only exclusion is that they can have gaps, so we check it here. I don't think we should process this rare case within this patch.
3810–3811	Why do we need to normalize insert indices? `DemandElts` is passed to `getScalarizationOverhead()` which just summarizes cost of all eliminated insertelements (with non-normalized operand indices).
4380–4384	I've removed this all, since using vectorized value without extraction now.
5369	Oh, sure, fixed this.
5375–5377	I've just got rid of this code using the fact we have `Scalars` already sorted by index.
5387–5388	Hmm, here we are preparing shuffled `V` with undefs elements if `V` is used without several elements inserted (before vectorizer). But undefined positions accept anything, so we can actually use completely filled `V` here. Thanks, I've changed it this way and removed all redundand code! The only rare case could be if any further instruction inserts element already inserted (having the same index), so we have to exclude this case. But this is guaranteed for inserts coming from `findBuildAggregate()`.

Address comments, refactor

Harbormaster completed remote builds in B103468: Diff 344026.May 10 2021, 5:23 AM

ABataev added inline comments.May 10 2021, 5:52 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Can we check for the gaps during the analysis of the build vector sequence instead?
3810–3811	Ah, you're getting it for SrcVecTy, got it.
4362–4364	I don't think this is correct. I think you need to use code from my patch D101555 for better cost estimation here.
5326	`ExtractAndExtendIfNeeded`

anton-afanasyev marked 4 inline comments as done.May 10 2021, 12:11 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	I've decided to check it here since we can get other sources of incoming insert bundle in future.
4362–4364	Why? We don't generate any extract instructions for external using of vector, so no need to cost it. I think you mixed two different cases up. The patch D101555 you referenced is about cost estimation when insert is _user_, but here is the case when insert is _used_. We do not really need to "extract" it, since its user uses vector value rather than scalar one. Also I don't think we need to use code from D101555 in this patch, since it does the same by the other way. The main idea of this patch is to unify the way we process inserts (the only vector tree node for now) vs ordinary tree nodes. The inserts are tree node now, and they are sorted by index, so no need to shuffle them. We could use `ReorderIndicies` if needed, but no need, since operands are sorted as well.
5326	Sure, done.

Address comments (minor)

Harbormaster completed remote builds in B103551: Diff 344140.May 10 2021, 12:13 PM

ABataev added inline comments.May 10 2021, 12:28 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	The no need to iterate through the whole list, use early exit out of loop with cancelled scheduling and gather nodes.
4362–4364	I think you're missing shuffle cost here. If the external user is a vector and extracted element is inserted into the different lane - it is at least shuffle and need to add its cost.

anton-afanasyev marked an inline comment as done.May 10 2021, 11:00 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4362–4364	I think you're missing shuffle cost here. The cost of what shuffle? We don't generate any shuffle. If the external user is a vector ... Not "the external user is a vector", but its operand is a vector. We do not need to extract any special lane, since _whole_ vector is using and replacing after "vectorization". In this special case (when tree node is inserts) we have vector "scalars" (inserts have vector type) and their "vectorization" is just removing (i.e. replacing by vectorized operands).

ABataev added inline comments.May 11 2021, 4:14 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3813–3816	Just: return -TTI->getScalarizationOverhead(SrcVecTy, DemandedElts, /Insert/ true, /Extract/ false);
4362–4364	Ok, I see. You correctly excluded the cost of the final (sub)vector reuse. But I suggest to improve the cost model for inserts. i1 = insertelement undef, v0 i2 = insertelement i1, v1 .... ii1 = insertelement undef, v1 If <v0,v1> gets vectorized, you need to count the cost of the extract of `v1`. Instead, we can count it as shuffle and build the final shuffle. It can be much more profitable than relying on extract element costs/instructions. But probably this can be addressed in the next patch.
4871–4879	You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it.

anton-afanasyev marked 6 inline comments as done.May 11 2021, 7:19 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3813–3816	I changed this code, used `Cost` variable.
4362–4364	Ok, I see you too. Though I don't see how this case is addressed within D101555, two inserts `i1` and `ii1` cannot be cought to the same tree and they don't occur in one `InsertUses` therefore. Anyway, I suggest to address this in the separate patch. It's rather rare case in natural life.
4871–4879	You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it. Ok, I've used something similar for inserts as for extracts. Do you think it's better to make this check within `getShuffleCost()` in `X86TargetTransformInfo.cpp` and `AArch64TargetTransformInfo.cpp`?

Address comments

Harbormaster completed remote builds in B103730: Diff 344390.May 11 2021, 7:19 AM

ABataev added inline comments.May 11 2021, 7:41 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

2899

Not done, you can early exit out of the loop if the non consecutive insert is found:

int Offset = *getInsertIndex(VL[0], 0);
ValueList Operands(VL.size());
Operands[0] = cast<Instruction>(VL[0])->getOperand(1);
for (unsigned I = 1, E = VL.size(); I < E; ++I) {
  if (I != *getInsertIndex(VL[I], 0) - Offset) {
    LLVM_DEBUG(dbgs() << "SLP: skipping non-consecutive inserts.\n");
    BS.cancelScheduling(VL, VL0);
    buildTree_rec(Operands, Depth, UserTreeIdx);
    return;
  }
  Operands[I] = cast<Instruction>(VL[I])->getOperand(1);
}
TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");
TE->setOperand(0, Operands);

ValueList VectorOperands;
for (Value *V : VL)
  VectorOperands.push_back(cast<Instruction>(V)->getOperand(0));

TE->setOperand(1, VectorOperands);

buildTree_rec(Operands, Depth + 1, {TE, 0});
return;

4871–4879

It would be good. Probably in a separate patch after this one

ABataev added inline comments.May 11 2021, 7:43 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4362–4364	Actually, D101555 addresses exactly described problem. In some cases, it really improves the performance, especially for SSE/AVX/AVX2 targets. I will update it once this patch is landed

anton-afanasyev marked 4 inline comments as done.May 11 2021, 9:43 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	This code is not what intended, since we early exit building tree with uncompletely filled `Operands`.
4871–4879	Ok, I'm to make it in a separate patch.

ABataev added inline comments.May 11 2021, 9:46 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	I meant `newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);`, just like we're doing for other nodes.

anton-afanasyev marked 3 inline comments as done.May 11 2021, 10:00 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Hmm, `buildTree_rec()` here (with completely filled Operands) is intended: if we skip vectorization of non-consecutive inserts, we still try to vectorize starting from `Operands` (as it were before this patch). I think it's rare case when such operands could be successful seed for vectorizable tree, but why not to try?

ABataev added inline comments.May 11 2021, 10:12 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Hmm, does it mean you're going to support something like this: i1 = insertelement undef, v0, 0 i2 = insertelement i1, v1, 2 `\|` `V` v = <v0, v1> i2 = shuffle v, undef, <0, undef, 1, undef> ? How do we handle shuffle in this case? As external uses and inserts of extractelements? And we do not subtract the costs of insertelements in this case?

anton-afanasyev marked 2 inline comments as done.May 11 2021, 11:10 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Yes, for that case we end up with previous combination of inserts/extracts and we don't subtract the cost of inserts/extracts to be eliminated by `instcombine` later, but that's better than nothing, if total cost is still good. I've checked that neither my version nor your one doesn't affect any test case and it all looks like rather speculative and rare case though.

ABataev added inline comments.May 11 2021, 11:25 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	At least add a FIXME to properly support this kind of vectorization in the future

anton-afanasyev marked 2 inline comments as done.May 11 2021, 11:33 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2899	Ok, added

Add FIXME for non-consecutive insert case

Harbormaster completed remote builds in B103820: Diff 344510.May 11 2021, 11:34 AM

Please can you rebase against trunk? At least some of these test diffs should go as I've regenerated them.

@RKSimon, sure, rebased

Harbormaster completed remote builds in B103985: Diff 344728.May 12 2021, 2:30 AM

Looks good with a nit

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3819	I think need to check that `NumElts != NumScalars`

LGTM (to unblock this)

This revision is now accepted and ready to land.May 12 2021, 4:16 AM

anton-afanasyev marked an inline comment as done.May 12 2021, 4:33 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3819	I've removed this check since `Offset % NumScalars != 0` implies `NumElts != NumScalars`. Do you think we need check for readability?

ABataev added inline comments.May 12 2021, 4:45 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3819	I would add this check just in case

anton-afanasyev marked 2 inline comments as done.May 12 2021, 4:55 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3819	Ok, added.

Address a nit

Harbormaster completed remote builds in B104019: Diff 344771.May 12 2021, 4:57 AM

This revision was landed with ongoing or failed builds.May 12 2021, 9:41 PM

Closed by commit rGab2c499d3a2e: [SLP] Add insertelement instructions to vectorizable tree (authored by anton-afanasyev). · Explain Why

This revision was automatically updated to reflect the committed changes.

anton-afanasyev mentioned this in rG00a0595b253f: [SLP][Test] Fix and precommit tests for D98714.

anton-afanasyev mentioned this in rGcd9090031c83: [SLP][Test] Fix and precommit tests for D98714.

anton-afanasyev added a commit: rGab2c499d3a2e: [SLP] Add insertelement instructions to vectorizable tree.

This patch introduces an assertion error we believe may be contributing to a miscompile (along with some other recent SLP patches -- this patch fixes the reduced case in http://llvm.org/PR50323, but doesn't fix the full case it was reduced from):

$ opt reduced.ll -disable-output -O1 -slp-vectorizer  # See below for reduced.ll
opt: /home/rupprecht/src/llvm-project/llvm/lib/IR/Type.cpp:648: static llvm::FixedVectorType *llvm::FixedVectorType::get(llvm::Type *, unsigned int): Assertion `isValidElementType(ElementType) && "Element type of a VectorType must " "be an integer, floating point, or " "pointer type."' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt reduced.ll -disable-output -O1 -slp-vectorizer
...
#10 0x000000000697c8f8 llvm::FixedVectorType::get(llvm::Type*, unsigned int) /home/rupprecht/src/llvm-project/llvm/lib/IR/Type.cpp:650:36
#11 0x0000000007752a5e llvm::slpvectorizer::BoUpSLP::getSpillCost() const /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4321:21
#12 0x0000000007753060 llvm::slpvectorizer::BoUpSLP::getTreeCost() /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4384:31
#13 0x000000000775f888 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6740:32
#14 0x0000000007760dab llvm::SLPVectorizerPass::vectorizeInsertElementInst(llvm::InsertElementInst*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7844:3
#15 0x0000000007760f84 llvm::SLPVectorizerPass::vectorizeSimpleInstructions(llvm::SmallVectorImpl<llvm::Instruction*>&, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, bool) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7858:21
#16 0x000000000775d5d8 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8019:21
#17 0x000000000775c6d3 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6395:16
#18 0x000000000775c26f llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6332:8
...
$ cat reduced.ll
; ModuleID = 'reduced.ll'
source_filename = "repro.cc"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { %struct.baz }
%struct.baz = type { double, double }
%struct.snork = type { <2 x double> }
%struct.spam = type { %struct.snork }

$_ZN1dC2Edd = comdat any

$_ZN1k1lE1d = comdat any

$_ZN1d1hES_ = comdat any

$_ZN1d1fEv = comdat any

$_ZN1d1eEv = comdat any

@global = external global %struct.widget, align 8

define <2 x double> @zot(%struct.widget* %arg, %struct.baz* %arg1) align 2 {
bb:
  %tmp = alloca %struct.snork, align 16
  %tmp2 = alloca %struct.widget*, align 8
  %tmp3 = alloca %struct.baz*, align 8
  store %struct.widget* %arg, %struct.widget** %tmp2, align 8, !tbaa !0
  store %struct.baz* %arg1, %struct.baz** %tmp3, align 8, !tbaa !0
  %tmp4 = load %struct.widget*, %struct.widget** %tmp2, align 8
  %tmp5 = load %struct.baz*, %struct.baz** %tmp3, align 8, !tbaa !0
  %tmp6 = getelementptr inbounds %struct.baz, %struct.baz* %tmp5, i32 0, i32 1
  %tmp7 = load double, double* %tmp6, align 8, !tbaa !4
  %tmp8 = getelementptr inbounds %struct.widget, %struct.widget* %tmp4, i32 0, i32 0
  %tmp9 = getelementptr inbounds %struct.baz, %struct.baz* %tmp8, i32 0, i32 1
  %tmp10 = load double, double* %tmp9, align 8, !tbaa !7
  %tmp11 = fsub double %tmp7, %tmp10
  %tmp12 = load %struct.baz*, %struct.baz** %tmp3, align 8, !tbaa !0
  %tmp13 = getelementptr inbounds %struct.baz, %struct.baz* %tmp12, i32 0, i32 0
  %tmp14 = load double, double* %tmp13, align 8, !tbaa !9
  %tmp15 = getelementptr inbounds %struct.widget, %struct.widget* %tmp4, i32 0, i32 0
  %tmp16 = getelementptr inbounds %struct.baz, %struct.baz* %tmp15, i32 0, i32 0
  %tmp17 = load double, double* %tmp16, align 8, !tbaa !10
  %tmp18 = fsub double %tmp14, %tmp17
  call void @wombat(%struct.snork* %tmp, double %tmp11, double %tmp18)
  %tmp19 = getelementptr inbounds %struct.snork, %struct.snork* %tmp, i32 0, i32 0
  %tmp20 = load <2 x double>, <2 x double>* %tmp19, align 16
  ret <2 x double> %tmp20
}

define linkonce_odr void @wombat(%struct.snork* %arg, double %arg1, double %arg2) unnamed_addr comdat($_ZN1dC2Edd) align 2 {
bb:
  %tmp = alloca %struct.snork*, align 8
  %tmp3 = alloca double, align 8
  %tmp4 = alloca double, align 8
  store %struct.snork* %arg, %struct.snork** %tmp, align 8, !tbaa !0
  store double %arg1, double* %tmp3, align 8, !tbaa !11
  store double %arg2, double* %tmp4, align 8, !tbaa !11
  %tmp5 = load %struct.snork*, %struct.snork** %tmp, align 8
  %tmp6 = getelementptr inbounds %struct.snork, %struct.snork* %tmp5, i32 0, i32 0
  %tmp7 = load double, double* %tmp3, align 8, !tbaa !11
  %tmp8 = insertelement <2 x double> undef, double %tmp7, i32 0
  %tmp9 = load double, double* %tmp4, align 8, !tbaa !11
  %tmp10 = insertelement <2 x double> %tmp8, double %tmp9, i32 1
  store <2 x double> %tmp10, <2 x double>* %tmp6, align 16, !tbaa !12
  ret void
}

define double @wombat.1() {
bb:
  %tmp = alloca %struct.widget, align 8
  %tmp1 = alloca %struct.spam, align 16
  %tmp2 = alloca %struct.snork, align 16
  %tmp3 = alloca %struct.baz, align 8
  %tmp4 = bitcast %struct.widget* %tmp to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp4, i8* bitcast (%struct.widget* @global to i8*), i64 16, i1 false), !tbaa.struct !13
  %tmp5 = bitcast %struct.spam* %tmp1 to i8*
  call void @llvm.memset.p0i8.i64(i8* %tmp5, i8 0, i64 16, i1 false)
  call void @quux()
  %tmp6 = getelementptr inbounds %struct.baz, %struct.baz* %tmp3, i32 0, i32 0
  store double 0.000000e+00, double* %tmp6, align 8, !tbaa !9
  %tmp7 = getelementptr inbounds %struct.baz, %struct.baz* %tmp3, i32 0, i32 1
  store double 0.000000e+00, double* %tmp7, align 8, !tbaa !4
  %tmp8 = call <2 x double> @zot(%struct.widget* %tmp, %struct.baz* %tmp3)
  %tmp9 = getelementptr inbounds %struct.snork, %struct.snork* %tmp2, i32 0, i32 0
  store <2 x double> %tmp8, <2 x double>* %tmp9, align 16
  %tmp10 = getelementptr inbounds %struct.snork, %struct.snork* %tmp2, i32 0, i32 0
  %tmp11 = load <2 x double>, <2 x double>* %tmp10, align 16
  %tmp12 = call double @wobble(%struct.spam* %tmp1, <2 x double> %tmp11)
  ret double %tmp12
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #0

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #1

declare void @quux() unnamed_addr align 2

define linkonce_odr double @wobble(%struct.spam* %arg, <2 x double> %arg1) comdat($_ZN1k1lE1d) align 2 {
bb:
  %tmp = alloca %struct.snork, align 16
  %tmp2 = alloca %struct.spam*, align 8
  %tmp3 = alloca %struct.snork, align 16
  %tmp4 = alloca %struct.snork, align 16
  %tmp5 = getelementptr inbounds %struct.snork, %struct.snork* %tmp, i32 0, i32 0
  store <2 x double> %arg1, <2 x double>* %tmp5, align 16
  store %struct.spam* %arg, %struct.spam** %tmp2, align 8, !tbaa !0
  %tmp6 = load %struct.spam*, %struct.spam** %tmp2, align 8
  %tmp7 = getelementptr inbounds %struct.spam, %struct.spam* %tmp6, i32 0, i32 0
  %tmp8 = bitcast %struct.snork* %tmp3 to i8*
  %tmp9 = bitcast %struct.snork* %tmp7 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp8, i8* %tmp9, i64 16, i1 false), !tbaa.struct !14
  %tmp10 = bitcast %struct.snork* %tmp4 to i8*
  %tmp11 = bitcast %struct.snork* %tmp to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp10, i8* %tmp11, i64 16, i1 false), !tbaa.struct !14
  %tmp12 = getelementptr inbounds %struct.snork, %struct.snork* %tmp3, i32 0, i32 0
  %tmp13 = load <2 x double>, <2 x double>* %tmp12, align 16
  %tmp14 = getelementptr inbounds %struct.snork, %struct.snork* %tmp4, i32 0, i32 0
  %tmp15 = load <2 x double>, <2 x double>* %tmp14, align 16
  %tmp16 = call double @eggs(<2 x double> %tmp13, <2 x double> %tmp15)
  ret double %tmp16
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #0

define linkonce_odr double @eggs(<2 x double> %arg, <2 x double> %arg1) align 2 {
bb:
  %tmp = alloca %struct.snork, align 16
  %tmp2 = alloca %struct.snork, align 16
  %tmp3 = alloca %struct.snork, align 16
  %tmp4 = getelementptr inbounds %struct.snork, %struct.snork* %tmp, i32 0, i32 0
  store <2 x double> %arg, <2 x double>* %tmp4, align 16
  %tmp5 = getelementptr inbounds %struct.snork, %struct.snork* %tmp2, i32 0, i32 0
  store <2 x double> %arg1, <2 x double>* %tmp5, align 16
  %tmp6 = bitcast %struct.snork* %tmp3 to i8*
  %tmp7 = bitcast %struct.snork* %tmp2 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp6, i8* %tmp7, i64 16, i1 false), !tbaa.struct !14
  %tmp8 = getelementptr inbounds %struct.snork, %struct.snork* %tmp3, i32 0, i32 0
  %tmp9 = load <2 x double>, <2 x double>* %tmp8, align 16
  %tmp10 = call double @wobble.2(%struct.snork* %tmp, <2 x double> %tmp9)
  ret double %tmp10
}

define linkonce_odr double @wobble.2(%struct.snork* %arg, <2 x double> %arg1) comdat($_ZN1d1hES_) align 2 {
bb:
  %tmp = alloca %struct.snork, align 16
  %tmp2 = alloca %struct.snork*, align 8
  %tmp3 = alloca %struct.snork, align 16
  %tmp4 = getelementptr inbounds %struct.snork, %struct.snork* %tmp, i32 0, i32 0
  store <2 x double> %arg1, <2 x double>* %tmp4, align 16
  store %struct.snork* %arg, %struct.snork** %tmp2, align 8, !tbaa !0
  %tmp5 = load %struct.snork*, %struct.snork** %tmp2, align 8
  %tmp6 = call double @quux.3(%struct.snork* %tmp)
  %tmp7 = call double @zot.4(%struct.snork* %tmp)
  call void @wombat(%struct.snork* %tmp3, double %tmp6, double %tmp7)
  %tmp8 = getelementptr inbounds %struct.snork, %struct.snork* %tmp5, i32 0, i32 0
  %tmp9 = load <2 x double>, <2 x double>* %tmp8, align 16, !tbaa !12
  %tmp10 = getelementptr inbounds %struct.snork, %struct.snork* %tmp3, i32 0, i32 0
  %tmp11 = load <2 x double>, <2 x double>* %tmp10, align 16, !tbaa !12
  %tmp12 = fmul <2 x double> %tmp11, %tmp9
  store <2 x double> %tmp12, <2 x double>* %tmp10, align 16, !tbaa !12
  %tmp13 = call double @zot.4(%struct.snork* %tmp3)
  %tmp14 = call double @quux.3(%struct.snork* %tmp3)
  %tmp15 = fsub double %tmp13, %tmp14
  ret double %tmp15
}

define linkonce_odr double @quux.3(%struct.snork* %arg) comdat($_ZN1d1fEv) align 2 {
bb:
  %tmp = alloca %struct.snork*, align 8
  store %struct.snork* %arg, %struct.snork** %tmp, align 8, !tbaa !0
  %tmp1 = load %struct.snork*, %struct.snork** %tmp, align 8
  %tmp2 = getelementptr inbounds %struct.snork, %struct.snork* %tmp1, i32 0, i32 0
  %tmp3 = load <2 x double>, <2 x double>* %tmp2, align 16, !tbaa !12
  %tmp4 = extractelement <2 x double> %tmp3, i32 1
  ret double %tmp4
}

define linkonce_odr double @zot.4(%struct.snork* %arg) comdat($_ZN1d1eEv) align 2 {
bb:
  %tmp = alloca %struct.snork*, align 8
  store %struct.snork* %arg, %struct.snork** %tmp, align 8, !tbaa !0
  %tmp1 = load %struct.snork*, %struct.snork** %tmp, align 8
  %tmp2 = getelementptr inbounds %struct.snork, %struct.snork* %tmp1, i32 0, i32 0
  %tmp3 = load <2 x double>, <2 x double>* %tmp2, align 16, !tbaa !12
  %tmp4 = extractelement <2 x double> %tmp3, i32 0
  ret double %tmp4
}

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
attributes #1 = { argmemonly nofree nosync nounwind willreturn writeonly }

!0 = !{!1, !1, i64 0}
!1 = !{!"any pointer", !2, i64 0}
!2 = !{!"omnipotent char", !3, i64 0}
!3 = !{!"Simple C++ TBAA"}
!4 = !{!5, !6, i64 8}
!5 = !{!"_ZTS1a", !6, i64 0, !6, i64 8}
!6 = !{!"double", !2, i64 0}
!7 = !{!8, !6, i64 8}
!8 = !{!"_ZTS1p", !5, i64 0}
!9 = !{!5, !6, i64 0}
!10 = !{!8, !6, i64 0}
!11 = !{!6, !6, i64 0}
!12 = !{!2, !2, i64 0}
!13 = !{i64 0, i64 8, !11, i64 8, i64 8, !11}
!14 = !{i64 0, i64 16, !12}

(Sorry for the length -- this is as far as llvm-reduce would take it)

anton-afanasyev mentioned this in rG207cdd7ed9fc: [SLP] Fix spill cost computation for insertelement tree node.May 14 2021, 3:15 AM

In D98714#2758640, @rupprecht wrote:

This patch introduces an assertion error we believe may be contributing to a miscompile (along with some other recent SLP patches -- this patch fixes the reduced case in http://llvm.org/PR50323, but doesn't fix the full case it was reduced from):

Thanks for report, fixed here https://reviews.llvm.org/rG207cdd7ed9fc

Hi, there is another issue that can be reproduced with existing test case:

$opt -S -slp-vectorizer -slp-threshold=-10000 test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll -slp-min-tree-size=0
opt: .../llvm-project/llvm/lib/IR/Type.cpp:648: static llvm::FixedVectorType* llvm::FixedVectorType::get(llvm::Type*, unsigned int): Assertion `isValidElementType(ElementType) && "Element type of a VectorType must " "be an integer, floating point, or " "pointer type."' failed.

Thanks,

Valery

Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:

__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}

If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:

# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0

But after your change it is now generating the following code:

# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]

From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?

In D98714#2764805, @dyung wrote:
Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}
If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0
But after your change it is now generating the following code:
# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?

Hi, could you check if D101555 fixes this issue?

anton-afanasyev mentioned this in D102675: [SLP] Fix "gathering" of vector values.May 18 2021, 2:05 AM

In D98714#2764135, @vdmitrie wrote:
Hi, there is another issue that can be reproduced with existing test case:

$opt -S -slp-vectorizer -slp-threshold=-10000 test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll -slp-min-tree-size=0
opt: .../llvm-project/llvm/lib/IR/Type.cpp:648: static llvm::FixedVectorType* llvm::FixedVectorType::get(llvm::Type*, unsigned int): Assertion `isValidElementType(ElementType) && "Element type of a VectorType must " "be an integer, floating point, or " "pointer type."' failed.

Thanks,
Valery

Thansk for the report, fixed here, need quick review: https://reviews.llvm.org/D102675

In D98714#2764828, @ABataev wrote:
In D98714#2764805, @dyung wrote:
Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}
If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0
But after your change it is now generating the following code:
# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?
Hi, could you check if D101555 fixes this issue?

Hi, I applied the patch locally and built the compiler, but the generated assembly actually seems it might be worse:

# %bb.0:                                # %entry
        vinsertf128     $1, %xmm0, %ymm1, %ymm2
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vhaddpd %ymm2, %ymm0, %ymm0
        vhaddpd %xmm1, %xmm1, %xmm1
        vextractf128    $1, %ymm0, %xmm2
        vunpcklpd       %xmm1, %xmm2, %xmm1     # xmm1 = xmm2[0],xmm1[0]
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        retq

In D98714#2765397, @dyung wrote:

In D98714#2764828, @ABataev wrote:
In D98714#2764805, @dyung wrote:
Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}
If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0
But after your change it is now generating the following code:
# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?
Hi, could you check if D101555 fixes this issue?

Hi, I applied the patch locally and built the compiler, but the generated assembly actually seems it might be worse:

# %bb.0:                                # %entry
        vinsertf128     $1, %xmm0, %ymm1, %ymm2
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vhaddpd %ymm2, %ymm0, %ymm0
        vhaddpd %xmm1, %xmm1, %xmm1
        vextractf128    $1, %ymm0, %xmm2
        vunpcklpd       %xmm1, %xmm2, %xmm1     # xmm1 = xmm2[0],xmm1[0]
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        retq

Ok, thanks, will fix it later today.

In D98714#2764805, @dyung wrote:
Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}
If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0
But after your change it is now generating the following code:
# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?

I see a different result for btver2.

# %bb.0:
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm1, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vinsertf128     $1, %xmm1, %ymm0, %ymm0

But I used llvm-11, most probably there is a difference with llvm-12.

Currently, it is impossible to fix this issue. This problem will be fixed after non-power-2 vectorization support in SLP is landed since here we have a build vector of 3 elements (the second index in shuffle is -1 and thus the second sum is optimized out resulting in the build of <4 x double> vector using just 3 insertelement instructions). Looks like previously this patter was recognized by another transformation pass but currently SLP tries to vectorize it

In D98714#2766166, @ABataev wrote:
In D98714#2764805, @dyung wrote:
Hi, this change has caused a regression in the codegen for one of our internal tests. Consider the following code:
__attribute__((noinline))
__m256d add_pd_002(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, 2, 3);
}
If you compile this with "-g0 -O3 -march=btver2", prior to your commit the compiler would generate the following code for the function:
# %bb.0:                                # %entry
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vhaddpd %ymm1, %ymm0, %ymm0
But after your change it is now generating the following code:
# %bb.0:                                # %entry
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm0, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vperm2f128      $2, %ymm0, %ymm1, %ymm0 # ymm0 = ymm0[0,1],ymm1[0,1]
From your commit description, it sounds like this is expected and will be fixed in a follow-up commit. Is my understanding of this correct?
I see a different result for btver2.
# %bb.0:
        vextractf128    $1, %ymm1, %xmm2
        vhaddpd %xmm1, %xmm0, %xmm0
        vhaddpd %xmm2, %xmm1, %xmm1
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
But I used llvm-11, most probably there is a difference with llvm-12.

Currently, it is impossible to fix this issue. This problem will be fixed after non-power-2 vectorization support in SLP is landed since here we have a build vector of 3 elements (the second index in shuffle is -1 and thus the second sum is optimized out resulting in the build of <4 x double> vector using just 3 insertelement instructions). Looks like previously this patter was recognized by another transformation pass but currently SLP tries to vectorize it

The new result I posted in your quote was the result of a compiler built from this change. It is unfortunate to hear that we will have to take a regression for this, but I will update our internal test to expect it and file a bug so that it is not forgotten.

anton-afanasyev mentioned this in rGb2cd89501164: [SLP] Fix "gathering" of insertelement instructions.May 24 2021, 3:36 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

SLPVectorizer.h

8 lines

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

298 lines

test/

Transforms/

SLPVectorizer/

AArch64/

accelerate-vector-functions-inseltpoison.ll

364 lines

accelerate-vector-functions.ll

364 lines

gather-root.ll

120 lines

insertelement-inseltpoison.ll

6 lines

insertelement.ll

6 lines

transpose-inseltpoison.ll

28 lines

transpose.ll

28 lines

vectorize-free-extracts-inserts.ll

307 lines

AMDGPU/

add_sub_sat-inseltpoison.ll

11 lines

add_sub_sat.ll

11 lines

bswap-inseltpoison.ll

7 lines

bswap.ll

7 lines

crash_extract_subvector_cost.ll

6 lines

round-inseltpoison.ll

7 lines

round.ll

7 lines

ARM/

extract-insert-inseltpoison.ll

10 lines

extract-insert.ll

10 lines

X86/

PR35865-inseltpoison.ll

6 lines

PR35865.ll

6 lines

PR39774.ll

8 lines

alternate-cast-inseltpoison.ll

301 lines

alternate-cast.ll

301 lines

alternate-fp-inseltpoison.ll

68 lines

alternate-fp.ll

68 lines

alternate-int-inseltpoison.ll

224 lines

alternate-int.ll

224 lines

arith-fp-inseltpoison.ll

498 lines

arith-fp.ll

498 lines

blending-shuffle-inseltpoison.ll

29 lines

blending-shuffle.ll

29 lines

external_user_jumbled_load-inseltpoison.ll

10 lines

external_user_jumbled_load.ll

10 lines

fptosi-inseltpoison.ll

30 lines

30 lines

64 lines

64 lines

64 lines

64 lines

insert-element-build-vector-inseltpoison.ll

100 lines

insert-element-build-vector.ll

100 lines

load-merge-inseltpoison.ll

16 lines

16 lines

18 lines

8 lines

20 lines

pr31599-inseltpoison.ll

6 lines

pr31599.ll

6 lines

pr40522.ll

30 lines

pr44067-inseltpoison.ll

6 lines

pr44067.ll

6 lines

pr47629-inseltpoison.ll

113 lines

113 lines

85 lines

709 lines

709 lines

sign-extend-inseltpoison.ll

20 lines

sign-extend.ll

20 lines

sitofp-inseltpoison.ll

56 lines

sitofp.ll

56 lines

value-bug-inseltpoison.ll

30 lines

value-bug.ll

30 lines

zext-inseltpoison.ll

570 lines

zext.ll

570 lines

vectorizable-functions-inseltpoison.ll

10 lines

vectorizable-functions.ll

10 lines

Diff 345039

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	private:
///		///
/// TODO: We can further reduce this cost if we flush the chain creation		/// TODO: We can further reduce this cost if we flush the chain creation
/// every time we run into a memory barrier.		/// every time we run into a memory barrier.
void collectSeedInstructions(BasicBlock *BB);		void collectSeedInstructions(BasicBlock *BB);

/// Try to vectorize a chain that starts at two arithmetic instrs.		/// Try to vectorize a chain that starts at two arithmetic instrs.
bool tryToVectorizePair(Value A, Value B, slpvectorizer::BoUpSLP &R);		bool tryToVectorizePair(Value A, Value B, slpvectorizer::BoUpSLP &R);

/// Try to vectorize a list of operands.		/// Try to vectorize a list of operands.
/// When \p InsertUses is provided and its entries are non-zero
/// then users of \p VL are known to be InsertElement instructions
/// each associated with same VL entry index. Their cost is then
/// used to adjust cost of the vectorization assuming instcombine pass
/// then optimizes ExtractElement-InsertElement sequence.
/// \returns true if a value was vectorized.		/// \returns true if a value was vectorized.
		RKSimonUnsubmitted Done Reply Inline Actions Remove the references to InsertUses in the doxygen, otherwise we'll get Wdocumentation warnings RKSimon: Remove the references to InsertUses in the doxygen, otherwise we'll get Wdocumentation warnings
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R,		bool tryToVectorizeList(ArrayRef<Value *> VL, slpvectorizer::BoUpSLP &R,
bool AllowReorder = false,		bool AllowReorder = false);
ArrayRef<Value *> InsertUses = None);

/// Try to vectorize a chain that may start at the operands of \p I.		/// Try to vectorize a chain that may start at the operands of \p I.
bool tryToVectorize(Instruction *I, slpvectorizer::BoUpSLP &R);		bool tryToVectorize(Instruction *I, slpvectorizer::BoUpSLP &R);

/// Vectorize the store instructions collected in Stores.		/// Vectorize the store instructions collected in Stores.
bool vectorizeStoreChains(slpvectorizer::BoUpSLP &R);		bool vectorizeStoreChains(slpvectorizer::BoUpSLP &R);

/// Vectorize the index computations of the getelementptr instructions		/// Vectorize the index computations of the getelementptr instructions
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	static void inversePermutation(ArrayRef<unsigned> Indices,
SmallVectorImpl<int> &Mask) {		SmallVectorImpl<int> &Mask) {
Mask.clear();		Mask.clear();
const unsigned E = Indices.size();		const unsigned E = Indices.size();
Mask.resize(E, E + 1);		Mask.resize(E, E + 1);
for (unsigned I = 0; I < E; ++I)		for (unsigned I = 0; I < E; ++I)
Mask[Indices[I]] = I;		Mask[Indices[I]] = I;
}		}

		/// \returns inserting index of InsertElement or InsertValue instruction,
		/// using Offset as base offset for index.
		static Optional<unsigned> getInsertIndex(Value *InsertInst, unsigned Offset) {
		unsigned Index = Offset;
		if (auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
		if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {
		auto *VT = cast<FixedVectorType>(IE->getType());
		Index *= VT->getNumElements();
		Index += CI->getZExtValue();
		return Index;
		}
		return None;
		}

		auto *IV = cast<InsertValueInst>(InsertInst);
		Type *CurrentType = IV->getType();
		for (unsigned I : IV->indices()) {
		if (auto *ST = dyn_cast<StructType>(CurrentType)) {
		Index *= ST->getNumElements();
		CurrentType = ST->getElementType(I);
		} else if (auto *AT = dyn_cast<ArrayType>(CurrentType)) {
		Index *= AT->getNumElements();
		CurrentType = AT->getElementType();
		} else {
		return None;
		}
		Index += I;
		}
		return Index;
		}

namespace slpvectorizer {		namespace slpvectorizer {

/// Bottom Up SLP Vectorizer.		/// Bottom Up SLP Vectorizer.
class BoUpSLP {		class BoUpSLP {
struct TreeEntry;		struct TreeEntry;
struct ScheduleData;		struct ScheduleData;

public:		public:
▲ Show 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines	struct TreeEntry {
/// A vector of scalars.		/// A vector of scalars.
ValueList Scalars;		ValueList Scalars;

/// The Scalars are vectorized into this value. It is initialized to Null.		/// The Scalars are vectorized into this value. It is initialized to Null.
Value *VectorizedValue = nullptr;		Value *VectorizedValue = nullptr;

/// Do we need to gather this sequence or vectorize it		/// Do we need to gather this sequence or vectorize it
/// (either with vector instruction or with scatter/gather		/// (either with vector instruction or with scatter/gather
/// intrinsics for store/load)?		/// intrinsics for store/load)?
		RKSimonUnsubmitted Done Reply Inline Actions Update comment RKSimon: Update comment
enum EntryState { Vectorize, ScatterVectorize, NeedToGather };		enum EntryState { Vectorize, ScatterVectorize, NeedToGather };
EntryState State;		EntryState State;

/// Does this sequence require some shuffling?		/// Does this sequence require some shuffling?
SmallVector<int, 4> ReuseShuffleIndices;		SmallVector<int, 4> ReuseShuffleIndices;

/// Does this entry require reordering?		/// Does this entry require reordering?
SmallVector<unsigned, 4> ReorderIndices;		SmallVector<unsigned, 4> ReorderIndices;
▲ Show 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	void schedule(ScheduleData *SD, ReadyListType &ReadyList) {
// ensures that the tree entry has all operands set before reaching		// ensures that the tree entry has all operands set before reaching
// this code. Couple of exceptions known at the moment are extracts		// this code. Couple of exceptions known at the moment are extracts
// where their second (immediate) operand is not added. Since		// where their second (immediate) operand is not added. Since
// immediates do not affect scheduler behavior this is considered		// immediates do not affect scheduler behavior this is considered
// okay.		// okay.
auto *In = TE->getMainOp();		auto *In = TE->getMainOp();
assert(In &&		assert(In &&
(isa<ExtractValueInst>(In) \|\| isa<ExtractElementInst>(In) \|\|		(isa<ExtractValueInst>(In) \|\| isa<ExtractElementInst>(In) \|\|
		isa<InsertElementInst>(In) \|\|
In->getNumOperands() == TE->getNumOperands()) &&		In->getNumOperands() == TE->getNumOperands()) &&
"Missed TreeEntry operands?");		"Missed TreeEntry operands?");
(void)In; // fake use to avoid build failure when assertions disabled		(void)In; // fake use to avoid build failure when assertions disabled

for (unsigned OpIdx = 0, NumOperands = TE->getNumOperands();		for (unsigned OpIdx = 0, NumOperands = TE->getNumOperands();
OpIdx != NumOperands; ++OpIdx)		OpIdx != NumOperands; ++OpIdx)
if (auto *I = dyn_cast<Instruction>(TE->getOperand(OpIdx)[Lane]))		if (auto *I = dyn_cast<Instruction>(TE->getOperand(OpIdx)[Lane]))
DecrUnsched(I);		DecrUnsched(I);
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
		ABataevUnsubmitted Done Reply Inline Actions `UserIgnoreList.empty()` ABataev: `UserIgnoreList.empty()`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. anton-afanasyev: Ok, thanks.
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
int FoundLane =		int FoundLane =
findLaneForValue(Entry->Scalars, Entry->ReuseShuffleIndices, Scalar);		findLaneForValue(Entry->Scalars, Entry->ReuseShuffleIndices, Scalar);

// Check if the scalar is externally used as an extra arg.		// Check if the scalar is externally used as an extra arg.
auto ExtI = ExternallyUsedValues.find(Scalar);		auto ExtI = ExternallyUsedValues.find(Scalar);
if (ExtI != ExternallyUsedValues.end()) {		if (ExtI != ExternallyUsedValues.end()) {
LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane "
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// Don't handle vectors.		// Don't handle vectors.
if (S.OpValue->getType()->isVectorTy()) {		if (S.OpValue->getType()->isVectorTy() &&
		!isa<InsertElementInst>(S.OpValue)) {
		ABataevUnsubmitted Done Reply Inline Actions `VL[0]`->`S.OpValue` ABataev: `VL[0]`->`S.OpValue`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, thanks. anton-afanasyev: Yes, thanks.
LLVM_DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
		ABataevUnsubmitted Done Reply Inline Actions What if some of the instructions are not insertelements? ABataev: What if some of the instructions are not insertelements?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions For now, the only way insertelements can appear here is through `vectorizeInsertElements()` function, which gathers inserts by `findBuildAggregate()`, so checking the first element is enough here. anton-afanasyev: For now, the only way insertelements can appear here is through `vectorizeInsertElements()`…
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))		if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))
if (SI->getValueOperand()->getType()->isVectorTy()) {		if (SI->getValueOperand()->getType()->isVectorTy()) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// If all of the operands are identical or constant we have a simple solution.		// If all of the operands are identical or constant we have a simple solution.
if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !S.getOpcode()) {		if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !S.getOpcode()) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}

// We now know that this is a vector of instructions of the same type from		// We now know that this is a vector of instructions of the same type from
		ABataevUnsubmitted Done Reply Inline Actions Isn't it the first instruction in the list? ABataev: Isn't it the first instruction in the list?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Not necessary. `findBuildAggregate()` sorts instructions by insert index, but main insert could have any one. Main insert is the only insert among others which has external user. I'm to add comment here to be more clear. anton-afanasyev: Not necessary. `findBuildAggregate()` sorts instructions by insert index, but main insert could…
// the same block.		// the same block.

// Don't vectorize ephemeral values.		// Don't vectorize ephemeral values.
for (Value *V : VL) {		for (Value *V : VL) {
if (EphValues.count(V)) {		if (EphValues.count(V)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is ephemeral.\n");		<< ") is ephemeral.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
		ABataevUnsubmitted Done Reply Inline Actions Why do we ignore user vector instructions here? Also, is this supposed only for extractelements operands? ABataev: Why do we ignore user vector instructions here? Also, is this supposed only for extractelements…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I have reworked this code. anton-afanasyev: I have reworked this code.
}		}

		ABataevUnsubmitted Done Reply Inline Actions No need for `else` here ABataev: No need for `else` here
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. anton-afanasyev: Ok, thanks.
// Check if this is a duplicate of another entry.		// Check if this is a duplicate of another entry.
if (TreeEntry *E = getTreeEntry(S.OpValue)) {		if (TreeEntry *E = getTreeEntry(S.OpValue)) {
LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tChecking bundle: " << *S.OpValue << ".\n");
if (!E->isSame(VL)) {		if (!E->isSame(VL)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");		LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
return;		return;
}		}
		case Instruction::InsertElement: {
		assert(ReuseShuffleIndicies.empty() && "All inserts should be unique");

		int Offset = *getInsertIndex(VL[0], 0);
		ValueList Operands(VL.size());
		Operands[0] = cast<Instruction>(VL[0])->getOperand(1);
		bool IsConsecutive = true;
		for (unsigned I = 1, E = VL.size(); I < E; ++I) {
		IsConsecutive &= (I == *getInsertIndex(VL[I], 0) - Offset);
		ABataevUnsubmitted Done Reply Inline Actions The first insert has undef vector operand, no? Plus, this has n^2 complexity ABataev: The first insert has undef vector operand, no? Plus, this has n^2 complexity
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Not necessary. There could be the case we deal with splitted array of inserts: %rv0 = insertelement <8 x i32> undef, i16 %r0 , i32 0 %rv1 = insertelement <8 x i32> %rv0 , i16 %r1 , i32 1 %rv2 = insertelement <8 x i32> %rv1 , i16 %r2 , i32 2 %rv3 = insertelement <8 x i32> %rv2 , i16 %r3 , i32 3 %rv4 = insertelement <8 x i32> %rv3 , i16 %r4 , i32 4 %rv5 = insertelement <8 x i32> %rv4 , i16 %r5 , i32 5 %rv6 = insertelement <8 x i32> %rv5 , i16 %r6 , i32 6 %rv7 = insertelement <8 x i32> %rv6 , i16 %r7 , i32 7 Since `<8 x i32>` doesn't fit SSE register, it is splitted to two `<4 x i32>`. The first time we process `%rv0-%rv3`, the next time -- `%rv4-%rv7`. anton-afanasyev: Not necessary. There could be the case we deal with splitted array of inserts: ``` %rv0 =…
		ABataevUnsubmitted Done Reply Inline Actions Hmm, looks like currently it can be only the first element. ABataev: Hmm, looks like currently it can be only the first element.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Why? We process long insert chains slice by slice: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L6408 anton-afanasyev: Why? We process long insert chains slice by slice: https://github.com/llvm/llvm…
		ABataevUnsubmitted Done Reply Inline Actions Yes, I mean the first element in the slice, i.e. `VL[0]` since we do not allow reordering of insertelement instructions, no? ABataev: Yes, I mean the first element in the slice, i.e. `VL[0]` since we do not allow reordering of…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions The inserts passed to `tryToVectorizeList()` are sorted by index (by `findBuildAggregate()`), so the first element defined by def-use order could mismatch to `VL[0]`. For instance: %rv0 = insertelement <8 x i32> undef, i16 %r0 , i32 3 %rv1 = insertelement <8 x i32> %rv0 , i16 %r1 , i32 2 %rv2 = insertelement <8 x i32> %rv1 , i16 %r2 , i32 0 %rv3 = insertelement <8 x i32> %rv2 , i16 %r3 , i32 1 %rv4 = insertelement <8 x i32> %rv3 , i16 %r4 , i32 4 %rv5 = insertelement <8 x i32> %rv4 , i16 %r5 , i32 5 %rv6 = insertelement <8 x i32> %rv5 , i16 %r6 , i32 6 %rv7 = insertelement <8 x i32> %rv6 , i16 %r7 , i32 7 anton-afanasyev: The inserts passed to `tryToVectorizeList()` are sorted by index (by `findBuildAggregate()`)…
		ABataevUnsubmitted Done Reply Inline Actions Outline this code into a separate helper function ABataev: Outline this code into a separate helper function
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, moved to `GetConstantOperand()` anton-afanasyev: Ok, moved to `GetConstantOperand()`
		Operands[I] = cast<Instruction>(VL[I])->getOperand(1);
		}
		ABataevUnsubmitted Done Reply Inline Actions for (Value V : VL) MinIndex = std::min<int>(MinIndex, GetConstantOperand(V)); ABataev:* ``` for (Value *V : VL) MinIndex = std::min<int>(MinIndex, GetConstantOperand(V)); ```
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done

		// FIXME: support vectorization of non-consecutive inserts
		// using shuffle with its proper cost estimation
		ABataevUnsubmitted Done Reply Inline Actions I think this can be simplified: int Offset = getInsertIndex(VL[0], 0); ValueList Operands(VL.size()); Operands[0] = cast<Instruction>(VL[0])->getOperand(1); bool IsConsecutive = true; for (unsigned I = 1, E = VL.size(); I < E; ++I) { IsConsecutive &= (I == getInsertIndex(VL[I], 0) - Offset); Operands[I] = cast<Instruction>(VL[I])->getOperand(1); } ABataev: I think this can be simplified: ``` int Offset = *getInsertIndex(VL[0], 0); ValueList Operands…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, sure, after refactoring we expect consecutive operands here and this can be simplified. Did this way. anton-afanasyev: Yes, sure, after refactoring we expect consecutive operands here and this can be simplified.
		if (IsConsecutive) {
		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx);
		ABataevUnsubmitted Done Reply Inline Actions If not consecutive, we can include the cost of single permute and still vectorize it, no? ABataev: If not consecutive, we can include the cost of single permute and still vectorize it, no?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Inserts here are coming from `findBuildAggregate()`, being already sorted by index and therefore consecutive, the only exclusion is that they can have gaps, so we check it here. I don't think we should process this rare case within this patch. anton-afanasyev: Inserts here are coming from `findBuildAggregate()`, being already sorted by index and…
		ABataevUnsubmitted Done Reply Inline Actions Can we check for the gaps during the analysis of the build vector sequence instead? ABataev: Can we check for the gaps during the analysis of the build vector sequence instead?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've decided to check it here since we can get other sources of incoming insert bundle in future. anton-afanasyev: I've decided to check it here since we can get other sources of incoming insert bundle in…
		ABataevUnsubmitted Done Reply Inline Actions The no need to iterate through the whole list, use early exit out of loop with cancelled scheduling and gather nodes. ABataev: The no need to iterate through the whole list, use early exit out of loop with cancelled…
		ABataevUnsubmitted Done Reply Inline Actions Not done, you can early exit out of the loop if the non consecutive insert is found: int Offset = getInsertIndex(VL[0], 0); ValueList Operands(VL.size()); Operands[0] = cast<Instruction>(VL[0])->getOperand(1); for (unsigned I = 1, E = VL.size(); I < E; ++I) { if (I != getInsertIndex(VL[I], 0) - Offset) { LLVM_DEBUG(dbgs() << "SLP: skipping non-consecutive inserts.\n"); BS.cancelScheduling(VL, VL0); buildTree_rec(Operands, Depth, UserTreeIdx); return; } Operands[I] = cast<Instruction>(VL[I])->getOperand(1); } TreeEntry TE = newTreeEntry(VL, Bundle /vectorized/, S, UserTreeIdx); LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n"); TE->setOperand(0, Operands); ValueList VectorOperands; for (Value V : VL) VectorOperands.push_back(cast<Instruction>(V)->getOperand(0)); TE->setOperand(1, VectorOperands); buildTree_rec(Operands, Depth + 1, {TE, 0}); return; ABataev: Not done, you can early exit out of the loop if the non consecutive insert is found: ```…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions This code is not what intended, since we early exit building tree with uncompletely filled `Operands`. anton-afanasyev: This code is not what intended, since we early exit building tree with uncompletely filled…
		ABataevUnsubmitted Done Reply Inline Actions I meant `newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);`, just like we're doing for other nodes. ABataev: I meant `newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);`, just like we're doing for…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, `buildTree_rec()` here (with completely filled Operands) is intended: if we skip vectorization of non-consecutive inserts, we still try to vectorize starting from `Operands` (as it were before this patch). I think it's rare case when such operands could be successful seed for vectorizable tree, but why not to try? anton-afanasyev: Hmm, `buildTree_rec()` here (with completely filled Operands) is intended: if we skip…
		ABataevUnsubmitted Done Reply Inline Actions Hmm, does it mean you're going to support something like this: i1 = insertelement undef, v0, 0 i2 = insertelement i1, v1, 2 `\|` `V` v = <v0, v1> i2 = shuffle v, undef, <0, undef, 1, undef> ? How do we handle shuffle in this case? As external uses and inserts of extractelements? And we do not subtract the costs of insertelements in this case? ABataev: Hmm, does it mean you're going to support something like this: ``` i1 = insertelement undef, v0…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, for that case we end up with previous combination of inserts/extracts and we don't subtract the cost of inserts/extracts to be eliminated by `instcombine` later, but that's better than nothing, if total cost is still good. I've checked that neither my version nor your one doesn't affect any test case and it all looks like rather speculative and rare case though. anton-afanasyev: Yes, for that case we end up with previous combination of inserts/extracts and we don't…
		ABataevUnsubmitted Done Reply Inline Actions At least add a FIXME to properly support this kind of vectorization in the future ABataev: At least add a FIXME to properly support this kind of vectorization in the future
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, added anton-afanasyev: Ok, added
		LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");
		ABataevUnsubmitted Done Reply Inline Actions Beter to turn this into lambda ABataev: Beter to turn this into lambda
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, added `GetConstantOperand()` anton-afanasyev: Ok, added `GetConstantOperand()`

		TE->setOperand(0, Operands);

		ValueList VectorOperands;
		for (Value *V : VL)
		VectorOperands.push_back(cast<Instruction>(V)->getOperand(0));

		TE->setOperand(1, VectorOperands);

		buildTree_rec(Operands, Depth + 1, {TE, 0});
		return;
		}

		LLVM_DEBUG(dbgs() << "SLP: skipping non-consecutive inserts.\n");
		BS.cancelScheduling(VL, VL0);
		buildTree_rec(Operands, Depth, UserTreeIdx);
		return;
		ABataevUnsubmitted Done Reply Inline Actions This really looks ugly. Why you can't use the existing functionality? ABataev: This really looks ugly. Why you can't use the existing functionality?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here. For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting. There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree: %rv0 = insertelement <4 x i32> undef, i16 %r0 , i32 3 %rv1 = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1 %rv2 = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0 %rv3 = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2 ... %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ... So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization. But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach. anton-afanasyev: Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here. For the…
		ABataevUnsubmitted Done Reply Inline Actions I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as an insert ABataev: I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions No, full-inserted vector could be used as well: ... %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ... %b = shufflevector <4 x i32> %rv3, <4 x i32> undef, ... So we vectorize all lanes, and then replace `%rv3` by `VectorizedValue` and replace `%rv2` by `VectorizedValue`, preceded by shuffle of `SK_ExtractVector` kind. anton-afanasyev: No, full-inserted vector could be used as well: ``` ... %a = shufflevector <4 x i32> %rv2, <4 x…
		ABataevUnsubmitted Done Reply Inline Actions I would vectorize `%rv0-%rv2` and kept `%rv3` as is, i.e. something like this: %vec = vector<%rv0, %rv1, %rv2> %rv3 = insertelement <4 x i32> %vec , i16 %r3 , i32 2 %a = shufflevector <4 x i32> %vec, <4 x i32> undef, ... ABataev: I would vectorize `%rv0-%rv2` and kept `%rv3` as is, i.e. something like this: ``` %vec =…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions This was just an simple example, one can imagine more complex case where whole vectorization and further extracting is rather cheap to do, so the decision is eventually made by cost. We use `getShuffleCost()` for extract cost at line 4225. anton-afanasyev: This was just an simple example, one can imagine more complex case where whole vectorization…
		ABataevUnsubmitted Done Reply Inline Actions I would start with the simple solution and later investigated possible improvements. ABataev: I would start with the simple solution and later investigated possible improvements.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I had several tests broken without this solution, so had to implement this. (for instance, https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll#L127). anton-afanasyev: I had several tests broken without this solution, so had to implement this. (for instance…
		ABataevUnsubmitted Done Reply Inline Actions Could you provide diffs with the broken test cases? ABataev: Could you provide diffs with the broken test cases?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've refactored this patch, so now looks better. anton-afanasyev: I've refactored this patch, so now looks better.
		}
case Instruction::Load: {		case Instruction::Load: {
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
// load. For example, we don't want to vectorize loads that are smaller		// load. For example, we don't want to vectorize loads that are smaller
// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM		// than 8-bit. Even though we have a packed struct {<i2, i2, i2, i2>} LLVM
// treats loading/storing it as an i8 struct. If we vectorize loads/stores		// treats loading/storing it as an i8 struct. If we vectorize loads/stores
// from such a struct, we read/write packed bits disagreeing with the		// from such a struct, we read/write packed bits disagreeing with the
// unvectorized version.		// unvectorized version.
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();

		ABataevUnsubmitted Done Reply Inline Actions Why do we need to do this tricky stuff? ABataev: Why do we need to do this tricky stuff?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Why do we need to rearrange `VL`? We make def-use order for this to correctly extract scalars used out-of-tree. We can get rid of rescheduling here by two ways: Sort `VL` _before_ scheduling, at the beginning of `buildTree_rec()`; Don't sort `VL` at all (use index order made by `findBuildAggregate()` and recover this order every time again when needed. I'm to try 2) way here. anton-afanasyev: Why do we need to rearrange `VL`? We make def-use order for this to correctly extract scalars…
		ABataevUnsubmitted Done Reply Inline Actions Why do you need to reschedule the bundle? ABataev: Why do you need to reschedule the bundle?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions No need to reschedule actually but to set some of variables to sync bundle with tree entry. But I've reworked this again. anton-afanasyev: No need to reschedule actually but to set some of variables to sync bundle with tree entry. But…
		ABataevUnsubmitted Done Reply Inline Actions Can you try to sort the VL before calling the buildtree function? To avoid rescheduling? ABataev: Can you try to sort the VL before calling the buildtree function? To avoid rescheduling?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've chosen the 2) way eventually, it looks more simple. No need to reorder and to use `ReorderIndicies`. The only disadvantage -- need to look for `FirstInsert` several times redundantly (up to O(n^2), but linear for common cases). anton-afanasyev: I've chosen the 2) way eventually, it looks more simple. No need to reorder and to use…
if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
return;		return;
}		}
Show All 9 Lines	case Instruction::Load: {
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
return;		return;
}		}
*POIter = L->getPointerOperand();		*POIter = L->getPointerOperand();
++POIter;		++POIter;
}		}

OrdersType CurrentOrder;		OrdersType CurrentOrder;
		ABataevUnsubmitted Done Reply Inline Actions The first check must be always true here, no? ABataev: The first check must be always true here, no?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, thanks! anton-afanasyev: Yes, thanks!
// Check the order of pointer operands.		// Check the order of pointer operands.
if (llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {		if (llvm::sortPtrAccesses(PointerOps, DL, SE, CurrentOrder)) {
Value *Ptr0;		Value *Ptr0;
Value *PtrN;		Value *PtrN;
if (CurrentOrder.empty()) {		if (CurrentOrder.empty()) {
Ptr0 = PointerOps.front();		Ptr0 = PointerOps.front();
PtrN = PointerOps.back();		PtrN = PointerOps.back();
} else {		} else {
▲ Show 20 Lines • Show All 626 Lines • ▼ Show 20 Lines
InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E) {		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))		else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))
ScalarTy = CI->getOperand(0)->getType();		ScalarTy = CI->getOperand(0)->getType();
		else if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
		ABataevUnsubmitted Done Reply Inline Actions `auto IE` ABataev:* `auto *IE`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());		auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

// If we have computed a smaller type for the expression, update VecTy so		// If we have computed a smaller type for the expression, update VecTy so
// that the costs will be accurate.		// that the costs will be accurate.
if (MinBWs.count(VL[0]))		if (MinBWs.count(VL[0]))
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
CommonCost -=		CommonCost -=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, I);
}		}
} else {		} else {
AdjustExtractsCost(CommonCost, /IsGather=/false);		AdjustExtractsCost(CommonCost, /IsGather=/false);
}		}
return CommonCost;		return CommonCost;
}		}
		case Instruction::InsertElement: {
		auto *SrcVecTy = cast<FixedVectorType>(VL0->getType());

		ABataevUnsubmitted Done Reply Inline Actions `VL[0]`->`VL0`? ABataev: `VL[0]`->`VL0`?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
		unsigned const NumElts = SrcVecTy->getNumElements();
		APInt DemandedElts = APInt::getNullValue(NumElts);
		for (auto *V : VL)
		DemandedElts.setBit(*getInsertIndex(V, 0));
		ABataevUnsubmitted Done Reply Inline Actions Hm, I think you need to normalize insert indices here, the `MinIndex` (offset) might be non `0` but should be 0. Plus, I think, need to add (subtract from vectcost) a cost of subvector insertion from Index `MinIndex` (if `MinIndex != 0`) ABataev: Hm, I think you need to normalize insert indices here, the `MinIndex` (offset) might be non `0`…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Why do we need to normalize insert indices? `DemandElts` is passed to `getScalarizationOverhead()` which just summarizes cost of all eliminated insertelements (with non-normalized operand indices). anton-afanasyev: Why do we need to normalize insert indices? `DemandElts` is passed to `getScalarizationOverhead…
		ABataevUnsubmitted Done Reply Inline Actions Ah, you're getting it for SrcVecTy, got it. ABataev: Ah, you're getting it for SrcVecTy, got it.

		ABataevUnsubmitted Done Reply Inline Actions Turn `GetConstantOperand` into a function and use it ABataev: Turn `GetConstantOperand` into a function and use it
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
		InstructionCost Cost = 0;
		Cost -= TTI->getScalarizationOverhead(SrcVecTy, DemandedElts,
		/Insert/ true, /Extract/ false);

		RKSimonUnsubmitted Done Reply Inline Actions Use getScalarizationOverhead instead of accumulating insertelement costs? That should avoid the need for the subvector costs below as well. RKSimon: Use getScalarizationOverhead instead of accumulating insertelement costs? That should avoid the…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done this way. anton-afanasyev: Thanks, done this way.
		ABataevUnsubmitted Done Reply Inline Actions Just: return -TTI->getScalarizationOverhead(SrcVecTy, DemandedElts, /Insert/ true, /Extract/ false); ABataev: Just: ``` return -TTI->getScalarizationOverhead(SrcVecTy, DemandedElts…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I changed this code, used `Cost` variable. anton-afanasyev: I changed this code, used `Cost` variable.
		unsigned const NumScalars = VL.size();
		unsigned const Offset = *getInsertIndex(VL[0], 0);
		if (NumElts != NumScalars && Offset % NumScalars != 0)
		ABataevUnsubmitted Done Reply Inline Actions I think need to check that `NumElts != NumScalars` ABataev: I think need to check that `NumElts != NumScalars`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've removed this check since `Offset % NumScalars != 0` implies `NumElts != NumScalars`. Do you think we need check for readability? anton-afanasyev: I've removed this check since `Offset % NumScalars != 0` implies `NumElts != NumScalars`. Do…
		ABataevUnsubmitted Done Reply Inline Actions I would add this check just in case ABataev: I would add this check just in case
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, added. anton-afanasyev: Ok, added.
		Cost += TTI->getShuffleCost(
		TargetTransformInfo::SK_InsertSubvector, SrcVecTy, /Mask/ None,
		Offset,
		FixedVectorType::get(SrcVecTy->getElementType(), NumScalars));

		ABataevUnsubmitted Done Reply Inline Actions I think this should be uncommented, otherwise, we are too optimistic about the cost. ABataev: I think this should be uncommented, otherwise, we are too optimistic about the cost.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've removed this commented code after `getScalarizationOverhead()` used as @RKSimon suggested. anton-afanasyev: I've removed this commented code after `getScalarizationOverhead()` used as @RKSimon suggested.
		return Cost;
		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	bool BoUpSLP::isFullyVectorizableTinyTree() const {
if (VectorizableTree.size() == 1 &&		if (VectorizableTree.size() == 1 &&
VectorizableTree[0]->State == TreeEntry::Vectorize)		VectorizableTree[0]->State == TreeEntry::Vectorize)
return true;		return true;

if (VectorizableTree.size() != 2)		if (VectorizableTree.size() != 2)
return false;		return false;

// Handle splat and all-constants stores. Also try to vectorize tiny trees		// Handle splat and all-constants stores. Also try to vectorize tiny trees
// with the second gather nodes if they have less scalar operands rather than		// with the second gather nodes if they have less scalar operands rather than
		ABataevUnsubmitted Done Reply Inline Actions I think better to check that all elements are `InsertElementInst` and move this check closer to the end of the function ABataev: I think better to check that all elements are `InsertElementInst` and move this check closer…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions If any of elements is `InsertElementInst`, they are all the same, so checking one is enough. And we can't move it closer to the end, since the next check can return true. anton-afanasyev: If any of elements is `InsertElementInst`, they are all the same, so checking one is enough.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Moreover, have to move it upper, to `isTreeTinyAndNotFullyVectorizable()` function, since need to exclude explicitly cases when vectorizing inserts of gathered values. It makes no sense and otherwise we can fall into infinite loop of generating inserts and vectorizing them again (for instance, when `-slp-min-tree-size=2` is set). anton-afanasyev: Moreover, have to move it upper, to `isTreeTinyAndNotFullyVectorizable()` function, since need…
// the initial tree element (may be profitable to shuffle the second gather).		// the initial tree element (may be profitable to shuffle the second gather).
if (VectorizableTree[0]->State == TreeEntry::Vectorize &&		if (VectorizableTree[0]->State == TreeEntry::Vectorize &&
(allConstant(VectorizableTree[1]->Scalars) \|\|		(allConstant(VectorizableTree[1]->Scalars) \|\|
		ABataevUnsubmitted Done Reply Inline Actions Do you still need this? ABataev: Do you still need this?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, still need. Moved this to separate block to be more clear. anton-afanasyev: Yes, still need. Moved this to separate block to be more clear.
isSplat(VectorizableTree[1]->Scalars) \|\|		isSplat(VectorizableTree[1]->Scalars) \|\|
(VectorizableTree[1]->State == TreeEntry::NeedToGather &&		(VectorizableTree[1]->State == TreeEntry::NeedToGather &&
VectorizableTree[1]->Scalars.size() <		VectorizableTree[1]->Scalars.size() <
VectorizableTree[0]->Scalars.size())))		VectorizableTree[0]->Scalars.size())))
return true;		return true;

// Gathering cost would be too much for tiny trees.		// Gathering cost would be too much for tiny trees.
if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|		if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (Value *Scalar : VectorizableTree[0]->Scalars) {
if (!match(Scalar, m_Store(m_Value(X), m_Value())) \|\|		if (!match(Scalar, m_Store(m_Value(X), m_Value())) \|\|
!isLoadCombineCandidateImpl(X, NumElts, TTI, /* MatchOr */ true))		!isLoadCombineCandidateImpl(X, NumElts, TTI, /* MatchOr */ true))
return false;		return false;
}		}
return true;		return true;
}		}

bool BoUpSLP::isTreeTinyAndNotFullyVectorizable() const {		bool BoUpSLP::isTreeTinyAndNotFullyVectorizable() const {
		// No need to vectorize inserts of gathered values.
		if (VectorizableTree.size() == 2 &&
		isa<InsertElementInst>(VectorizableTree[0]->Scalars[0]) &&
		VectorizableTree[1]->State == TreeEntry::NeedToGather)
		return true;

// We can vectorize the tree if its size is greater than or equal to the		// We can vectorize the tree if its size is greater than or equal to the
// minimum size specified by the MinTreeSize command line option.		// minimum size specified by the MinTreeSize command line option.
if (VectorizableTree.size() >= MinTreeSize)		if (VectorizableTree.size() >= MinTreeSize)
return false;		return false;

// If we have a tiny tree (a tree whose size is less than MinTreeSize), we		// If we have a tiny tree (a tree whose size is less than MinTreeSize), we
// can vectorize it if we can prove it fully vectorizable.		// can vectorize it if we can prove it fully vectorizable.
if (isFullyVectorizableTinyTree())		if (isFullyVectorizableTinyTree())
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
// removed prior to code generation, and so the extraction will be		// removed prior to code generation, and so the extraction will be
// removed as well).		// removed as well).
if (EphValues.count(EU.User))		if (EphValues.count(EU.User))
continue;		continue;

		// No extract cost for vector "scalar"
		if (isa<FixedVectorType>(EU.Scalar->getType()))
		ABataevUnsubmitted Done Reply Inline Actions Better to create a set here ABataev: Better to create a set here
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
		ABataevUnsubmitted Done Reply Inline Actions `ArrayRef<Value >` ABataev:* `ArrayRef<Value *>`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
		continue;
		ABataevUnsubmitted Done Reply Inline Actions Use `UndefMaskElem` instead of `-1` ABataev: Use `UndefMaskElem` instead of `-1`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
		ABataevUnsubmitted Done Reply Inline Actions I don't think this is correct. I think you need to use code from my patch D101555 for better cost estimation here. ABataev: I don't think this is correct. I think you need to use code from my patch D101555 for better…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Why? We don't generate any extract instructions for external using of vector, so no need to cost it. I think you mixed two different cases up. The patch D101555 you referenced is about cost estimation when insert is _user_, but here is the case when insert is _used_. We do not really need to "extract" it, since its user uses vector value rather than scalar one. Also I don't think we need to use code from D101555 in this patch, since it does the same by the other way. The main idea of this patch is to unify the way we process inserts (the only vector tree node for now) vs ordinary tree nodes. The inserts are tree node now, and they are sorted by index, so no need to shuffle them. We could use `ReorderIndicies` if needed, but no need, since operands are sorted as well. anton-afanasyev: Why? We don't generate any extract instructions for external using of vector, so no need to…
		ABataevUnsubmitted Done Reply Inline Actions I think you're missing shuffle cost here. If the external user is a vector and extracted element is inserted into the different lane - it is at least shuffle and need to add its cost. ABataev: I think you're missing shuffle cost here. If the external user is a vector and extracted…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I think you're missing shuffle cost here. The cost of what shuffle? We don't generate any shuffle. If the external user is a vector ... Not "the external user is a vector", but its operand is a vector. We do not need to extract any special lane, since _whole_ vector is using and replacing after "vectorization". In this special case (when tree node is inserts) we have vector "scalars" (inserts have vector type) and their "vectorization" is just removing (i.e. replacing by vectorized operands). anton-afanasyev: > I think you're missing shuffle cost here. The cost of what shuffle? We don't generate any…
		ABataevUnsubmitted Done Reply Inline Actions Ok, I see. You correctly excluded the cost of the final (sub)vector reuse. But I suggest to improve the cost model for inserts. i1 = insertelement undef, v0 i2 = insertelement i1, v1 .... ii1 = insertelement undef, v1 If <v0,v1> gets vectorized, you need to count the cost of the extract of `v1`. Instead, we can count it as shuffle and build the final shuffle. It can be much more profitable than relying on extract element costs/instructions. But probably this can be addressed in the next patch. ABataev: Ok, I see. You correctly excluded the cost of the final (sub)vector reuse. But I suggest to…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, I see you too. Though I don't see how this case is addressed within D101555, two inserts `i1` and `ii1` cannot be cought to the same tree and they don't occur in one `InsertUses` therefore. Anyway, I suggest to address this in the separate patch. It's rather rare case in natural life. anton-afanasyev: Ok, I see you too. Though I don't see how this case is addressed within D101555, two inserts…
		ABataevUnsubmitted Done Reply Inline Actions Actually, D101555 addresses exactly described problem. In some cases, it really improves the performance, especially for SSE/AVX/AVX2 targets. I will update it once this patch is landed ABataev: Actually, D101555 addresses exactly described problem. In some cases, it really improves the…

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
		ABataevUnsubmitted Done Reply Inline Actions `Value ` ABataev:* `Value *`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = FixedVectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0]->Scalars[0];		auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
		ABataevUnsubmitted Done Reply Inline Actions Same, looks ugly ABataev: Same, looks ugly
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
		ABataevUnsubmitted Done Reply Inline Actions `std::min<int>` and use a function ABataev: `std::min<int>` and use a function
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
auto Extend =		auto Extend =
MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;		MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
VecTy = FixedVectorType::get(MinTy, BundleWidth);		VecTy = FixedVectorType::get(MinTy, BundleWidth);
ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),		ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),
VecTy, EU.Lane);		VecTy, EU.Lane);
} else {		} else {
		ABataevUnsubmitted Done Reply Inline Actions Use a function to extract index ABataev: Use a function to extract index
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

		ABataevUnsubmitted Done Reply Inline Actions `Scalars.contains(Insert)` or `.count(Insert)` ABataev: `Scalars.contains(Insert)` or `.count(Insert)`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
InstructionCost SpillCost = getSpillCost();		InstructionCost SpillCost = getSpillCost();
		ABataevUnsubmitted Done Reply Inline Actions I think this should be: ExtractCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VL0->getType(), MinIndex, VecTy); ABataev: I think this should be: ``` ExtractCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VL0…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've removed this all, since using vectorized value without extraction now. anton-afanasyev: I've removed this all, since using vectorized value without extraction now.
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;

#ifndef NDEBUG		#ifndef NDEBUG
		ABataevUnsubmitted Done Reply Inline Actions Check for the kind of permutation in `Mask`, it may be an Identity of a Reverse kind. ABataev: Check for the kind of permutation in `Mask`, it may be an Identity of a Reverse kind.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions It's definitely not an Identity and not a Reverse here. anton-afanasyev: It's definitely not an Identity and not a Reverse here.
SmallString<256> Str;		SmallString<256> Str;
{		{
raw_svector_ostream OS(Str);		raw_svector_ostream OS(Str);
OS << "SLP: Spill Cost = " << SpillCost << ".\n"		OS << "SLP: Spill Cost = " << SpillCost << ".\n"
<< "SLP: Extract Cost = " << ExtractCost << ".\n"		<< "SLP: Extract Cost = " << ExtractCost << ".\n"
<< "SLP: Total Cost = " << Cost << ".\n";		<< "SLP: Total Cost = " << Cost << ".\n";
}		}
LLVM_DEBUG(dbgs() << Str);		LLVM_DEBUG(dbgs() << Str);
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();
if (auto *Store = dyn_cast<StoreInst>(VL0))		if (auto *Store = dyn_cast<StoreInst>(VL0))
ScalarTy = Store->getValueOperand()->getType();		ScalarTy = Store->getValueOperand()->getType();
		else if (auto *IE = dyn_cast<InsertElementInst>(VL0))
		ABataevUnsubmitted Done Reply Inline Actions `auto IE` ABataev:* `auto *IE`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
		ScalarTy = IE->getOperand(1)->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());		auto *VecTy = FixedVectorType::get(ScalarTy, E->Scalars.size());
switch (ShuffleOrOp) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
auto *PH = cast<PHINode>(VL0);		auto *PH = cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
Value *V = NewPhi;		Value *V = NewPhi;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	case Instruction::ExtractValue: {
LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());		LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign());
Value *NewV = propagateMetadata(V, E->Scalars);		Value *NewV = propagateMetadata(V, E->Scalars);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
		case Instruction::InsertElement: {
		Builder.SetInsertPoint(VL0);
		Value *V = vectorizeTree(E->getOperand(0));

		const unsigned NumElts =
		cast<FixedVectorType>(VL0->getType())->getNumElements();
		const unsigned NumScalars = E->Scalars.size();

		// Create InsertVector shuffle if necessary
		if (NumElts != NumScalars) {
		unsigned MinIndex = *getInsertIndex(E->Scalars[0], 0);
		Instruction *FirstInsert = nullptr;
		for (auto *Scalar : E->Scalars)
		if (!FirstInsert &&
		!is_contained(E->Scalars,
		cast<Instruction>(Scalar)->getOperand(0)))
		FirstInsert = cast<Instruction>(Scalar);

		// Create shuffle to resize vector
		SmallVector<int, 16> Mask(NumElts, UndefMaskElem);
		std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);
		V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()), Mask);

		const unsigned MaxIndex = MinIndex + NumScalars;
		for (unsigned I = 0; I < NumElts; I++)
		ABataevUnsubmitted Done Reply Inline Actions SmallVector<int, 16> Mask(NumElts, UndefMaskElem); std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0); Also, I think just `std::iota(Mask.begin(), Mask.end(), 0);` shall work too ABataev: ``` SmallVector<int, 16> Mask(NumElts, UndefMaskElem); std::iota(Mask.begin(), std::next(Mask.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, changed to `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` Yes, `std::iota(Mask.begin(), Mask.end(), 0);` gives the same result, but wouldn't it lead to redundant code lowered? anton-afanasyev: Thanks, changed to `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` Yes…
		ABataevUnsubmitted Done Reply Inline Actions `std::iota(Mask.begin(), Mask.end(), 0);` will produce the same code as `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` but only if `NumScalars <= Mask.size() * 2`. Otherwise the compiler may crash in some cases. So, better to keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` ABataev: `std::iota(Mask.begin(), Mask.end(), 0);` will produce the same code as `std::iota(Mask.begin()…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions `Mask.size()` is just `NumElts` and `NumScalars <= NumElts` (is it worth to add assert for this?), so `NumScalars <= Mask.size() * 2` for all cases. What I did mean by "the same result" is that shuffle %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> is equivalent to %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> (here `NumScalars == 2` and `NumElts == 4`). anton-afanasyev: `Mask.size()` is just `NumElts` and `NumScalars <= NumElts` (is it worth to add assert for this?
		ABataevUnsubmitted Done Reply Inline Actions We may have a situation say NumScalars is 2 and NumElts is 8, if insert the same scalars several times. ABataev: We may have a situation say NumScalars is 2 and NumElts is 8, if insert the same scalars…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Do you mean "NumScalars is 8 and NumElts is 2"? This case is excluded, we can get only unique inserts (inserts with unique index) from `findBuildAggregate()` for now. Also, we even assert assert(ReuseShuffleIndicies.empty() && "All inserts should be unique"); at first line of `InsertElement` at the `buildTree_rec()` function. anton-afanasyev: Do you mean "NumScalars is 8 and NumElts is 2"? This case is excluded, we can get only unique…
		ABataevUnsubmitted Done Reply Inline Actions Well, maybe :) Anyway, let's keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` just to avoid fixin something in future when we add support for vectorization of more code patterns ABataev: Well, maybe :) Anyway, let's keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars)…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Sure! anton-afanasyev: Sure!
		Mask[I] =
		(I < MinIndex \|\| I >= MaxIndex) ? I : NumElts - MinIndex + I;

		V = Builder.CreateShuffleVector(
		FirstInsert->getOperand(0), V, Mask,
		cast<Instruction>(E->Scalars[NumScalars - 1])->getName());
		}

		++NumVectorInstructions;
		ABataevUnsubmitted Done Reply Inline Actions Can we use `ShuffleBuilder` here? ABataev: Can we use `ShuffleBuilder` here?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, I thought about `ShuffleBuilder`, but here we just need to create two shuffles of special kind for vector resizing. It requires `ShuffleInstructionBuilder` expanding, don't think it's worth it. anton-afanasyev: Yes, I thought about `ShuffleBuilder`, but here we just need to create two shuffles of special…
		ABataevUnsubmitted Done Reply Inline Actions I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors ABataev: I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Did you include these costs in the cost model? They are included in getScalarizationOverhead(). I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors These subvector inserts are lowered to nop actually, so they cost nothing. We need this code when processing big vector of scalars part-by-part, but every chunk fits the whole vector register (condition `bits >= MinVecRegSize`), so actually there is no inserting. The result consisting of several vector registers is returned then. anton-afanasyev: >>> Did you include these costs in the cost model? >> They are included in…
		ABataevUnsubmitted Done Reply Inline Actions This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1 ABataev: This might be not true for some targets/code patterns. Plus, the second pattern is a…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1 Both of these shuffles are just the one operation of "inserting", we need the first one to expand source vector since shufflevector needs the same size of operands. We can add cost of this shuffles, but this prevents several vectorization being performed before, since this cost is redundant (may be `TTI->getShuffleCost()` should be tuned for `TargetTransformInfo::SK_InsertVector`?) anton-afanasyev: > This might be not true for some targets/code patterns. Plus, the second pattern is a…
		ABataevUnsubmitted Done Reply Inline Actions Need to check that the vectorized code is really profitable and if so, need to tune the cost model, yes. ABataev: Need to check that the vectorized code is really profitable and if so, need to tune the cost…
		ABataevUnsubmitted Done Reply Inline Actions You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it. ABataev: You can do this check by yourself. We have something similar for extracts, where we check if…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it. Ok, I've used something similar for inserts as for extracts. Do you think it's better to make this check within `getShuffleCost()` in `X86TargetTransformInfo.cpp` and `AArch64TargetTransformInfo.cpp`? anton-afanasyev: > You can do this check by yourself. We have something similar for extracts, where we check if…
		ABataevUnsubmitted Done Reply Inline Actions It would be good. Probably in a separate patch after this one ABataev: It would be good. Probably in a separate patch after this one
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, I'm to make it in a separate patch. anton-afanasyev: Ok, I'm to make it in a separate patch.
		ABataevUnsubmitted Done Reply Inline Actions Did you include these costs in the cost model? ABataev: Did you include these costs in the cost model?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions They are included in `getScalarizationOverhead()`. anton-afanasyev: They are included in `getScalarizationOverhead()`.
		E->VectorizedValue = V;
		return V;
		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
		ABataevUnsubmitted Done Reply Inline Actions Can we leave it to InstCombiner and return the original insrtelement instruction as a result? ABataev: Can we leave it to InstCombiner and return the original insrtelement instruction as a result?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Do you mean to come back to extracts generating here? I don't believe it's a good way by several reasons: It doesn't match the common logic of vectorization (changing scalar operations to one vector operation), If we leave insertelements unvectorized (i.e. replaced by shuffles), these inserts may be somehow processed again, If we can deal with it in SLPVectorizer, why not to do it early? It's not too hard to do. anton-afanasyev: Do you mean to come back to extracts generating here? I don't believe it's a good way by…
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

▲ Show 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	if (MinBWs.count(ScalarRoot)) {
auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);		auto *VecTy = FixedVectorType::get(MinTy, BundleWidth);
auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);		auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);
VectorizableTree[0]->VectorizedValue = Trunc;		VectorizableTree[0]->VectorizedValue = Trunc;
}		}

LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()		LLVM_DEBUG(dbgs() << "SLP: Extracting " << ExternalUses.size()
<< " values .\n");		<< " values .\n");

// If necessary, sign-extend or zero-extend ScalarRoot to the larger type
// specified by ScalarType.
auto extend = [&](Value ScalarRoot, Value Ex, Type *ScalarType) {
if (!MinBWs.count(ScalarRoot))
return Ex;
if (MinBWs[ScalarRoot].second)
return Builder.CreateSExt(Ex, ScalarType);
return Builder.CreateZExt(Ex, ScalarType);
};

// Extract all of the elements with the external uses.		// Extract all of the elements with the external uses.
for (const auto &ExternalUse : ExternalUses) {		for (const auto &ExternalUse : ExternalUses) {
Value *Scalar = ExternalUse.Scalar;		Value *Scalar = ExternalUse.Scalar;
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");
assert(E->State != TreeEntry::NeedToGather &&		assert(E->State != TreeEntry::NeedToGather &&
"Extracting from a gather list");		"Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
		auto ExtractAndExtendIfNeeded = [&](Value *Vec) {
		ABataevUnsubmitted Done Reply Inline Actions `ExtractAndExtendIfNeeded` ABataev: `ExtractAndExtendIfNeeded`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Sure, done. anton-afanasyev: Sure, done.
		if (Scalar->getType() != Vec->getType()) {
		Value *Ex = Builder.CreateExtractElement(Vec, Lane);
		// If necessary, sign-extend or zero-extend ScalarRoot
		// to the larger type.
		if (!MinBWs.count(ScalarRoot))
		return Ex;
		if (MinBWs[ScalarRoot].second)
		return Builder.CreateSExt(Ex, Scalar->getType());
		return Builder.CreateZExt(Ex, Scalar->getType());
		} else {
		assert(isa<FixedVectorType>(Scalar->getType()) &&
		isa<InsertElementInst>(Scalar) &&
		"In-tree scalar of vector type is not insertelement?");
		return Vec;
		}
		};
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
// ExternallyUsedValues.		// ExternallyUsedValues.
if (!User) {		if (!User) {
assert(ExternallyUsedValues.count(Scalar) &&		assert(ExternallyUsedValues.count(Scalar) &&
"Scalar with nullptr as an external user must be registered in "		"Scalar with nullptr as an external user must be registered in "
"ExternallyUsedValues map");		"ExternallyUsedValues map");
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
Builder.SetInsertPoint(VecI->getParent(),		Builder.SetInsertPoint(VecI->getParent(),
std::next(VecI->getIterator()));		std::next(VecI->getIterator()));
} else {		} else {
Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
}		}
Value *Ex = Builder.CreateExtractElement(Vec, Lane);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
Ex = extend(ScalarRoot, Ex, Scalar->getType());
CSEBlocks.insert(cast<Instruction>(Scalar)->getParent());		CSEBlocks.insert(cast<Instruction>(Scalar)->getParent());
auto &Locs = ExternallyUsedValues[Scalar];		auto &Locs = ExternallyUsedValues[Scalar];
ExternallyUsedValues.insert({Ex, Locs});		ExternallyUsedValues.insert({NewInst, Locs});
ExternallyUsedValues.erase(Scalar);		ExternallyUsedValues.erase(Scalar);
// Required to update internally referenced instructions.		// Required to update internally referenced instructions.
Scalar->replaceAllUsesWith(Ex);		Scalar->replaceAllUsesWith(NewInst);
continue;		continue;
}		}

// Generate extracts for out-of-tree users.		// Generate extracts for out-of-tree users.
// Find the insertion point for the extractelement lane.		// Find the insertion point for the extractelement lane.
if (auto *VecI = dyn_cast<Instruction>(Vec)) {		if (auto *VecI = dyn_cast<Instruction>(Vec)) {
if (PHINode *PH = dyn_cast<PHINode>(User)) {		if (PHINode *PH = dyn_cast<PHINode>(User)) {
		ABataevUnsubmitted Done Reply Inline Actions What if the user is Phi in thу middle of other phis? ABataev: What if the user is Phi in thу middle of other phis?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Oh, sure, fixed this. anton-afanasyev: Oh, sure, fixed this.
for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {		for (int i = 0, e = PH->getNumIncomingValues(); i != e; ++i) {
if (PH->getIncomingValue(i) == Scalar) {		if (PH->getIncomingValue(i) == Scalar) {
Instruction *IncomingTerminator =		Instruction *IncomingTerminator =
PH->getIncomingBlock(i)->getTerminator();		PH->getIncomingBlock(i)->getTerminator();
		ABataevUnsubmitted Done Reply Inline Actions `UndefMaskElem` ABataev: `UndefMaskElem`
if (isa<CatchSwitchInst>(IncomingTerminator)) {		if (isa<CatchSwitchInst>(IncomingTerminator)) {
Builder.SetInsertPoint(VecI->getParent(),		Builder.SetInsertPoint(VecI->getParent(),
std::next(VecI->getIterator()));		std::next(VecI->getIterator()));
} else {		} else {
		ABataevUnsubmitted Done Reply Inline Actions You're using this code in many places, make it a function. ABataev: You're using this code in many places, make it a function.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've just got rid of this code using the fact we have `Scalars` already sorted by index. anton-afanasyev: I've just got rid of this code using the fact we have `Scalars` already sorted by index.
Builder.SetInsertPoint(PH->getIncomingBlock(i)->getTerminator());		Builder.SetInsertPoint(PH->getIncomingBlock(i)->getTerminator());
}		}
Value *Ex = Builder.CreateExtractElement(Vec, Lane);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
Ex = extend(ScalarRoot, Ex, Scalar->getType());
CSEBlocks.insert(PH->getIncomingBlock(i));		CSEBlocks.insert(PH->getIncomingBlock(i));
PH->setOperand(i, Ex);		PH->setOperand(i, NewInst);
}		}
}		}
} else {		} else {
		ABataevUnsubmitted Done Reply Inline Actions `Scalars.contain(Insert)` ABataev: `Scalars.contain(Insert)`
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
Builder.SetInsertPoint(cast<Instruction>(User));		Builder.SetInsertPoint(cast<Instruction>(User));
Value *Ex = Builder.CreateExtractElement(Vec, Lane);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
Ex = extend(ScalarRoot, Ex, Scalar->getType());
CSEBlocks.insert(cast<Instruction>(User)->getParent());		CSEBlocks.insert(cast<Instruction>(User)->getParent());
		ABataevUnsubmitted Done Reply Inline Actions Why we can't use just `V` here? ABataev: Why we can't use just `V` here?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, here we are preparing shuffled `V` with undefs elements if `V` is used without several elements inserted (before vectorizer). But undefined positions accept anything, so we can actually use completely filled `V` here. Thanks, I've changed it this way and removed all redundand code! The only rare case could be if any further instruction inserts element already inserted (having the same index), so we have to exclude this case. But this is guaranteed for inserts coming from `findBuildAggregate()`. anton-afanasyev: Hmm, here we are preparing shuffled `V` with undefs elements if `V` is used without several…
User->replaceUsesOfWith(Scalar, Ex);		User->replaceUsesOfWith(Scalar, NewInst);
}		}
} else {		} else {
Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
Value *Ex = Builder.CreateExtractElement(Vec, Lane);		Value *NewInst = ExtractAndExtendIfNeeded(Vec);
Ex = extend(ScalarRoot, Ex, Scalar->getType());
CSEBlocks.insert(&F->getEntryBlock());		CSEBlocks.insert(&F->getEntryBlock());
User->replaceUsesOfWith(Scalar, Ex);		User->replaceUsesOfWith(Scalar, NewInst);
}		}

LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");
}		}

		ABataevUnsubmitted Done Reply Inline Actions continue; } if (...) { ABataev: ``` continue; } if (...) { ```
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, done anton-afanasyev: Thanks, done
// For each vectorized value:		// For each vectorized value:
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->State == TreeEntry::NeedToGather)		if (Entry->State == TreeEntry::NeedToGather)
continue;		continue;

assert(Entry->VectorizedValue && "Can't find vectorizable value");		assert(Entry->VectorizedValue && "Can't find vectorizable value");

// For each lane:		// For each lane:
		ABataevUnsubmitted Done Reply Inline Actions assert for `Vec`? ABataev: assert for `Vec`?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. anton-afanasyev: Ok, thanks.
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];

#ifndef NDEBUG		#ifndef NDEBUG
Type *Ty = Scalar->getType();		Type *Ty = Scalar->getType();
if (!Ty->isVoidTy()) {		if (!Ty->isVoidTy()) {
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");
▲ Show 20 Lines • Show All 380 Lines • ▼ Show 20 Lines	while (BundleMember) {
BundleMember->incrementUnscheduledDeps(1);		BundleMember->incrementUnscheduledDeps(1);
if (!DestBundle->hasValidDependencies())		if (!DestBundle->hasValidDependencies())
WorkList.push_back(DestBundle);		WorkList.push_back(DestBundle);
}		}
} else {		} else {
for (User *U : BundleMember->Inst->users()) {		for (User *U : BundleMember->Inst->users()) {
if (isa<Instruction>(U)) {		if (isa<Instruction>(U)) {
ScheduleData *UseSD = getScheduleData(U);		ScheduleData *UseSD = getScheduleData(U);
if (UseSD && isInSchedulingRegion(UseSD->FirstInBundle)) {		if (UseSD && isInSchedulingRegion(UseSD->FirstInBundle) &&
		// Ignore inner deps for insertelement
		!(UseSD->FirstInBundle == SD &&
		isa<InsertElementInst>(BundleMember->Inst))) {
BundleMember->Dependencies++;		BundleMember->Dependencies++;
ScheduleData *DestBundle = UseSD->FirstInBundle;		ScheduleData *DestBundle = UseSD->FirstInBundle;
if (!DestBundle->IsScheduled)		if (!DestBundle->IsScheduled)
BundleMember->incrementUnscheduledDeps(1);		BundleMember->incrementUnscheduledDeps(1);
if (!DestBundle->hasValidDependencies())		if (!DestBundle->hasValidDependencies())
WorkList.push_back(DestBundle);		WorkList.push_back(DestBundle);
}		}
} else {		} else {
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	unsigned BoUpSLP::getVectorElementSize(Value *V) {
// truncated just before storing) without traversing the expression tree.		// truncated just before storing) without traversing the expression tree.
// This is the common case.		// This is the common case.
if (auto *Store = dyn_cast<StoreInst>(V)) {		if (auto *Store = dyn_cast<StoreInst>(V)) {
if (auto *Trunc = dyn_cast<TruncInst>(Store->getValueOperand()))		if (auto *Trunc = dyn_cast<TruncInst>(Store->getValueOperand()))
return DL->getTypeSizeInBits(Trunc->getSrcTy());		return DL->getTypeSizeInBits(Trunc->getSrcTy());
return DL->getTypeSizeInBits(Store->getValueOperand()->getType());		return DL->getTypeSizeInBits(Store->getValueOperand()->getType());
}		}

		if (auto *IEI = dyn_cast<InsertElementInst>(V))
		return getVectorElementSize(IEI->getOperand(1));

auto E = InstrElementSize.find(V);		auto E = InstrElementSize.find(V);
if (E != InstrElementSize.end())		if (E != InstrElementSize.end())
return E->second;		return E->second;

// If V is not a store, we can traverse the expression tree to find loads		// If V is not a store, we can traverse the expression tree to find loads
// that feed it. The type of the loaded value may indicate a more suitable		// that feed it. The type of the loaded value may indicate a more suitable
// width than V's type. We want to base the vector element size on the width		// width than V's type. We want to base the vector element size on the width
// of memory operations where possible.		// of memory operations where possible.
▲ Show 20 Lines • Show All 631 Lines • ▼ Show 20 Lines
bool SLPVectorizerPass::tryToVectorizePair(Value A, Value B, BoUpSLP &R) {		bool SLPVectorizerPass::tryToVectorizePair(Value A, Value B, BoUpSLP &R) {
if (!A \|\| !B)		if (!A \|\| !B)
return false;		return false;
Value *VL[] = {A, B};		Value *VL[] = {A, B};
return tryToVectorizeList(VL, R, /AllowReorder=/true);		return tryToVectorizeList(VL, R, /AllowReorder=/true);
}		}

bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,		bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
bool AllowReorder,		bool AllowReorder) {
ArrayRef<Value *> InsertUses) {
if (VL.size() < 2)		if (VL.size() < 2)
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize a list of length = "		LLVM_DEBUG(dbgs() << "SLP: Trying to vectorize a list of length = "
<< VL.size() << ".\n");		<< VL.size() << ".\n");

// Check that all of the parts are instructions of the same type,		// Check that all of the parts are instructions of the same type,
// we permit an alternate opcode via InstructionsState.		// we permit an alternate opcode via InstructionsState.
InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);
if (!S.getOpcode())		if (!S.getOpcode())
return false;		return false;

Instruction *I0 = cast<Instruction>(S.OpValue);		Instruction *I0 = cast<Instruction>(S.OpValue);
// Make sure invalid types (including vector type) are rejected before		// Make sure invalid types (including vector type) are rejected before
// determining vectorization factor for scalar instructions.		// determining vectorization factor for scalar instructions.
for (Value *V : VL) {		for (Value *V : VL) {
Type *Ty = V->getType();		Type *Ty = V->getType();
if (!isValidElementType(Ty)) {		if (!isa<InsertElementInst>(V) && !isValidElementType(Ty)) {
// NOTE: the following will give user internal llvm type name, which may		// NOTE: the following will give user internal llvm type name, which may
// not be useful.		// not be useful.
R.getORE()->emit([&]() {		R.getORE()->emit([&]() {
std::string type_str;		std::string type_str;
llvm::raw_string_ostream rso(type_str);		llvm::raw_string_ostream rso(type_str);
Ty->print(rso);		Ty->print(rso);
return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)		return OptimizationRemarkMissed(SV_NAME, "UnsupportedType", I0)
<< "Cannot SLP vectorize list: type "		<< "Cannot SLP vectorize list: type "
Show All 14 Lines	R.getORE()->emit([&]() {
<< "less than 2 is not supported";		<< "less than 2 is not supported";
});		});
return false;		return false;
}		}

bool Changed = false;		bool Changed = false;
bool CandidateFound = false;		bool CandidateFound = false;
InstructionCost MinCost = SLPCostThreshold.getValue();		InstructionCost MinCost = SLPCostThreshold.getValue();
		Type *ScalarTy = VL[0]->getType();
bool CompensateUseCost =		if (auto *IE = dyn_cast<InsertElementInst>(VL[0]))
!InsertUses.empty() && llvm::all_of(InsertUses, [](const Value *V) {		ScalarTy = IE->getOperand(1)->getType();
return V && isa<InsertElementInst>(V);
});
assert((!CompensateUseCost \|\| InsertUses.size() == VL.size()) &&
"Each scalar expected to have an associated InsertElement user.");

unsigned NextInst = 0, MaxInst = VL.size();		unsigned NextInst = 0, MaxInst = VL.size();
for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {		for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {
// No actual vectorization should happen, if number of parts is the same as		// No actual vectorization should happen, if number of parts is the same as
// provided vectorization factor (i.e. the scalar type is used for vector		// provided vectorization factor (i.e. the scalar type is used for vector
// code during codegen).		// code during codegen).
auto *VecTy = FixedVectorType::get(VL[0]->getType(), VF);		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
if (TTI->getNumberOfParts(VecTy) == VF)		if (TTI->getNumberOfParts(VecTy) == VF)
continue;		continue;
for (unsigned I = NextInst; I < MaxInst; ++I) {		for (unsigned I = NextInst; I < MaxInst; ++I) {
unsigned OpsWidth = 0;		unsigned OpsWidth = 0;

if (I + VF > MaxInst)		if (I + VF > MaxInst)
OpsWidth = MaxInst - I;		OpsWidth = MaxInst - I;
else		else
OpsWidth = VF;		OpsWidth = VF;

if (!isPowerOf2_32(OpsWidth) \|\| OpsWidth < 2)		if (!isPowerOf2_32(OpsWidth) \|\| OpsWidth < 2)
break;		break;

ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);		ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);
// Check that a previous iteration of this loop did not delete the Value.		// Check that a previous iteration of this loop did not delete the Value.
		ABataevUnsubmitted Done Reply Inline Actions Why not just pass `InsertUses` as ёМДё here and drop `InsertUses` parameter completely? ABataev: Why not just pass `InsertUses` as ёМДё here and drop `InsertUses` parameter completely?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Yes, I've planned to do this, but can't: we make a decision about vectorization based on the _operands_: InstructionsState S = getSameOpcode(VL); if (!S.getOpcode()) return false; ... unsigned Sz = R.getVectorElementSize(I0); but then make vectorization starting from inserts. anton-afanasyev: Yes, I've planned to do this, but can't: we make a decision about vectorization based on the…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, eventually found a way to drop `InsertUses` parameter. anton-afanasyev: Ok, eventually found a way to drop `InsertUses` parameter.
if (llvm::any_of(Ops, [&R](Value *V) {		if (llvm::any_of(Ops, [&R](Value *V) {
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
return I && R.isDeleted(I);		return I && R.isDeleted(I);
}))		}))
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "
<< "\n");		<< "\n");
Show All 12 Lines	for (unsigned I = NextInst; I < MaxInst; ++I) {
R.buildTree(ReorderedOps, None);		R.buildTree(ReorderedOps, None);
}		}
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
continue;		continue;

R.computeMinimumValueSizes();		R.computeMinimumValueSizes();
InstructionCost Cost = R.getTreeCost();		InstructionCost Cost = R.getTreeCost();
CandidateFound = true;		CandidateFound = true;
if (CompensateUseCost) {
// TODO: Use TTI's getScalarizationOverhead for sequence of inserts
// rather than sum of single inserts as the latter may overestimate
// cost. This work should imply improving cost estimation for extracts
// that added in for external (for vectorization tree) users,i.e. that
// part should also switch to same interface.
// For example, the following case is projected code after SLP:
// %4 = extractelement <4 x i64> %3, i32 0
// %v0 = insertelement <4 x i64> poison, i64 %4, i32 0
// %5 = extractelement <4 x i64> %3, i32 1
// %v1 = insertelement <4 x i64> %v0, i64 %5, i32 1
// %6 = extractelement <4 x i64> %3, i32 2
// %v2 = insertelement <4 x i64> %v1, i64 %6, i32 2
// %7 = extractelement <4 x i64> %3, i32 3
// %v3 = insertelement <4 x i64> %v2, i64 %7, i32 3
//
// Extracts here added by SLP in order to feed users (the inserts) of
// original scalars and contribute to "ExtractCost" at cost evaluation.
// The inserts in turn form sequence to build an aggregate that
// detected by findBuildAggregate routine.
// SLP makes an assumption that such sequence will be optimized away
// later (instcombine) so it tries to compensate ExctractCost with
// cost of insert sequence.
// Current per element cost calculation approach is not quite accurate
// and tends to create bias toward favoring vectorization.
// Switching to the TTI interface might help a bit.
// Alternative solution could be pattern-match to detect a no-op or
// shuffle.
InstructionCost UserCost = 0;
for (unsigned Lane = 0; Lane < OpsWidth; Lane++) {
auto *IE = cast<InsertElementInst>(InsertUses[I + Lane]);
if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2)))
UserCost += TTI->getVectorInstrCost(
Instruction::InsertElement, IE->getType(), CI->getZExtValue());
}
LLVM_DEBUG(dbgs() << "SLP: Compensate cost of users by: " << UserCost
<< ".\n");
Cost -= UserCost;
}

MinCost = std::min(MinCost, Cost);		MinCost = std::min(MinCost, Cost);

if (Cost < -SLPCostThreshold) {		if (Cost < -SLPCostThreshold) {
LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");
R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList",		R.getORE()->emit(OptimizationRemark(SV_NAME, "VectorizedList",
cast<Instruction>(Ops[0]))		cast<Instruction>(Ops[0]))
<< "SLP vectorized with cost " << ore::NV("Cost", Cost)		<< "SLP vectorized with cost " << ore::NV("Cost", Cost)
<< " and with tree size "		<< " and with tree size "
▲ Show 20 Lines • Show All 801 Lines • ▼ Show 20 Lines	do {
} else if (CurrentType->isSingleValueType()) {		} else if (CurrentType->isSingleValueType()) {
return AggregateSize;		return AggregateSize;
} else {		} else {
return None;		return None;
}		}
} while (true);		} while (true);
}		}

static Optional<unsigned> getOperandIndex(Instruction *InsertInst,
unsigned OperandOffset) {
unsigned OperandIndex = OperandOffset;
if (auto *IE = dyn_cast<InsertElementInst>(InsertInst)) {
if (auto *CI = dyn_cast<ConstantInt>(IE->getOperand(2))) {
auto *VT = cast<FixedVectorType>(IE->getType());
OperandIndex *= VT->getNumElements();
OperandIndex += CI->getZExtValue();
return OperandIndex;
}
return None;
}

auto *IV = cast<InsertValueInst>(InsertInst);
Type *CurrentType = IV->getType();
for (unsigned int Index : IV->indices()) {
if (auto *ST = dyn_cast<StructType>(CurrentType)) {
OperandIndex *= ST->getNumElements();
CurrentType = ST->getElementType(Index);
} else if (auto *AT = dyn_cast<ArrayType>(CurrentType)) {
OperandIndex *= AT->getNumElements();
CurrentType = AT->getElementType();
} else {
return None;
}
OperandIndex += Index;
}
return OperandIndex;
}

static bool findBuildAggregate_rec(Instruction *LastInsertInst,		static bool findBuildAggregate_rec(Instruction *LastInsertInst,
TargetTransformInfo *TTI,		TargetTransformInfo *TTI,
SmallVectorImpl<Value *> &BuildVectorOpds,		SmallVectorImpl<Value *> &BuildVectorOpds,
SmallVectorImpl<Value *> &InsertElts,		SmallVectorImpl<Value *> &InsertElts,
unsigned OperandOffset) {		unsigned OperandOffset) {
do {		do {
Value *InsertedOperand = LastInsertInst->getOperand(1);		Value *InsertedOperand = LastInsertInst->getOperand(1);
Optional<unsigned> OperandIndex =		Optional<unsigned> OperandIndex =
getOperandIndex(LastInsertInst, OperandOffset);		getInsertIndex(LastInsertInst, OperandOffset);
if (!OperandIndex)		if (!OperandIndex)
return false;		return false;
if (isa<InsertElementInst>(InsertedOperand) \|\|		if (isa<InsertElementInst>(InsertedOperand) \|\|
isa<InsertValueInst>(InsertedOperand)) {		isa<InsertValueInst>(InsertedOperand)) {
if (!findBuildAggregate_rec(cast<Instruction>(InsertedOperand), TTI,		if (!findBuildAggregate_rec(cast<Instruction>(InsertedOperand), TTI,
BuildVectorOpds, InsertElts, *OperandIndex))		BuildVectorOpds, InsertElts, *OperandIndex))
return false;		return false;
} else {		} else {
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
SmallVector<Value *, 16> BuildVectorOpds;		SmallVector<Value *, 16> BuildVectorOpds;
SmallVector<Value *, 16> BuildVectorInsts;		SmallVector<Value *, 16> BuildVectorInsts;
if (!findBuildAggregate(IVI, TTI, BuildVectorOpds, BuildVectorInsts))		if (!findBuildAggregate(IVI, TTI, BuildVectorOpds, BuildVectorInsts))
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IVI << "\n");		LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IVI << "\n");
// Aggregate value is unlikely to be processed in vector register, we need to		// Aggregate value is unlikely to be processed in vector register, we need to
// extract scalars into scalar registers, so NeedExtraction is set true.		// extract scalars into scalar registers, so NeedExtraction is set true.
return tryToVectorizeList(BuildVectorOpds, R, /AllowReorder=/false,		return tryToVectorizeList(BuildVectorOpds, R, /AllowReorder=/false);
BuildVectorInsts);
}		}

bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,		bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
SmallVector<Value *, 16> BuildVectorInsts;		SmallVector<Value *, 16> BuildVectorInsts;
SmallVector<Value *, 16> BuildVectorOpds;		SmallVector<Value *, 16> BuildVectorOpds;
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!findBuildAggregate(IEI, TTI, BuildVectorOpds, BuildVectorInsts) \|\|		if (!findBuildAggregate(IEI, TTI, BuildVectorOpds, BuildVectorInsts) \|\|
(llvm::all_of(BuildVectorOpds,		(llvm::all_of(BuildVectorOpds,
[](Value *V) { return isa<ExtractElementInst>(V); }) &&		[](Value *V) { return isa<ExtractElementInst>(V); }) &&
isShuffle(BuildVectorOpds, Mask)))		isShuffle(BuildVectorOpds, Mask)))
return false;		return false;

// Vectorize starting with the build vector operands ignoring the BuildVector		LLVM_DEBUG(dbgs() << "SLP: array mappable to vector: " << *IEI << "\n");
// instructions for the purpose of scheduling and user extraction.		return tryToVectorizeList(BuildVectorInsts, R, /AllowReorder=/false);
return tryToVectorizeList(BuildVectorOpds, R, /AllowReorder=/false,
BuildVectorInsts);
}		}

bool SLPVectorizerPass::vectorizeSimpleInstructions(		bool SLPVectorizerPass::vectorizeSimpleInstructions(
SmallVectorImpl<Instruction > &Instructions, BasicBlock BB, BoUpSLP &R,		SmallVectorImpl<Instruction > &Instructions, BasicBlock BB, BoUpSLP &R,
bool AtTerminator) {		bool AtTerminator) {
bool OpsChanged = false;		bool OpsChanged = false;
SmallVector<Instruction *, 4> PostponedCmps;		SmallVector<Instruction *, 4> PostponedCmps;
for (auto *I : reverse(Instructions)) {		for (auto *I : reverse(Instructions)) {
▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -inject-tli-mappings -slp-vectorizer -vector-library=Accelerate -S %s \| FileCheck %s		; RUN: opt -inject-tli-mappings -slp-vectorizer -vector-library=Accelerate -S %s \| FileCheck %s
; RUN: opt -inject-tli-mappings -slp-vectorizer -S %s \| FileCheck --check-prefix NOACCELERATE %s		; RUN: opt -inject-tli-mappings -slp-vectorizer -S %s \| FileCheck --check-prefix NOACCELERATE %s

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios14.0.0"		target triple = "arm64-apple-ios14.0.0"

declare float @llvm.sin.f32(float)		declare float @llvm.sin.f32(float)

; Accelerate provides sin() for <4 x float>		; Accelerate provides sin() for <4 x float>
define <4 x float> @int_sin_4x(<4 x float>* %a) {		define <4 x float> @int_sin_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_sin_4x(		; CHECK-LABEL: @int_sin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_sin_4x(		; NOACCELERATE-LABEL: @int_sin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.sin.f32(float %vecext)		%1 = tail call fast float @llvm.sin.f32(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.sin.f32(float %vecext.1)		%2 = tail call fast float @llvm.sin.f32(float %vecext.1)
Show All 9 Lines

declare float @ceilf(float) readonly		declare float @ceilf(float) readonly

define <4 x float> @ceil_4x(<4 x float>* %a) {		define <4 x float> @ceil_4x(<4 x float>* %a) {
; CHECK-LABEL: @ceil_4x(		; CHECK-LABEL: @ceil_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @ceil_4x(		; NOACCELERATE-LABEL: @ceil_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @ceilf(float %vecext)		%1 = tail call fast float @ceilf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @ceilf(float %vecext.1)		%2 = tail call fast float @ceilf(float %vecext.1)
Show All 9 Lines

declare float @fabsf(float) readonly		declare float @fabsf(float) readonly

define <4 x float> @fabs_4x(<4 x float>* %a) {		define <4 x float> @fabs_4x(<4 x float>* %a) {
; CHECK-LABEL: @fabs_4x(		; CHECK-LABEL: @fabs_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @fabs_4x(		; NOACCELERATE-LABEL: @fabs_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @fabsf(float %vecext)		%1 = tail call fast float @fabsf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @fabsf(float %vecext.1)		%2 = tail call fast float @fabsf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @fabsf(float %vecext.2)		%3 = tail call fast float @fabsf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @fabsf(float %vecext.3)		%4 = tail call fast float @fabsf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @llvm.fabs.f32(float)		declare float @llvm.fabs.f32(float)
define <4 x float> @int_fabs_4x(<4 x float>* %a) {		define <4 x float> @int_fabs_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_fabs_4x(		; CHECK-LABEL: @int_fabs_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_fabs_4x(		; NOACCELERATE-LABEL: @int_fabs_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.fabs.f32(float %vecext)		%1 = tail call fast float @llvm.fabs.f32(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.fabs.f32(float %vecext.1)		%2 = tail call fast float @llvm.fabs.f32(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @llvm.fabs.f32(float %vecext.2)		%3 = tail call fast float @llvm.fabs.f32(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @llvm.fabs.f32(float %vecext.3)		%4 = tail call fast float @llvm.fabs.f32(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @floorf(float) readonly		declare float @floorf(float) readonly
define <4 x float> @floor_4x(<4 x float>* %a) {		define <4 x float> @floor_4x(<4 x float>* %a) {
; CHECK-LABEL: @floor_4x(		; CHECK-LABEL: @floor_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @floor_4x(		; NOACCELERATE-LABEL: @floor_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @floorf(float %vecext)		%1 = tail call fast float @floorf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @floorf(float %vecext.1)		%2 = tail call fast float @floorf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @floorf(float %vecext.2)		%3 = tail call fast float @floorf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @floorf(float %vecext.3)		%4 = tail call fast float @floorf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sqrtf(float) readonly		declare float @sqrtf(float) readonly
define <4 x float> @sqrt_4x(<4 x float>* %a) {		define <4 x float> @sqrt_4x(<4 x float>* %a) {
; CHECK-LABEL: @sqrt_4x(		; CHECK-LABEL: @sqrt_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sqrt_4x(		; NOACCELERATE-LABEL: @sqrt_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @sqrtf(float %vecext)		%1 = tail call fast float @sqrtf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @sqrtf(float %vecext.1)		%2 = tail call fast float @sqrtf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @sqrtf(float %vecext.2)		%3 = tail call fast float @sqrtf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @sqrtf(float %vecext.3)		%4 = tail call fast float @sqrtf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @expf(float) readonly		declare float @expf(float) readonly
define <4 x float> @exp_4x(<4 x float>* %a) {		define <4 x float> @exp_4x(<4 x float>* %a) {
; CHECK-LABEL: @exp_4x(		; CHECK-LABEL: @exp_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @exp_4x(		; NOACCELERATE-LABEL: @exp_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @expf(float %vecext)		%1 = tail call fast float @expf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @expf(float %vecext.1)		%2 = tail call fast float @expf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @expf(float %vecext.2)		%3 = tail call fast float @expf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @expf(float %vecext.3)		%4 = tail call fast float @expf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @expm1f(float) readonly		declare float @expm1f(float) readonly
define <4 x float> @expm1_4x(<4 x float>* %a) {		define <4 x float> @expm1_4x(<4 x float>* %a) {
; CHECK-LABEL: @expm1_4x(		; CHECK-LABEL: @expm1_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpm1f(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpm1f(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @expm1_4x(		; NOACCELERATE-LABEL: @expm1_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expm1f(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expm1f(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @logf(float) readonly		declare float @logf(float) readonly
define <4 x float> @log_4x(<4 x float>* %a) {		define <4 x float> @log_4x(<4 x float>* %a) {
; CHECK-LABEL: @log_4x(		; CHECK-LABEL: @log_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @log_4x(		; NOACCELERATE-LABEL: @log_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @logf(float %vecext)		%1 = tail call fast float @logf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @logf(float %vecext.1)		%2 = tail call fast float @logf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @logf(float %vecext.2)		%3 = tail call fast float @logf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @logf(float %vecext.3)		%4 = tail call fast float @logf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @log1pf(float) readonly		declare float @log1pf(float) readonly
define <4 x float> @log1p_4x(<4 x float>* %a) {		define <4 x float> @log1p_4x(<4 x float>* %a) {
; CHECK-LABEL: @log1p_4x(		; CHECK-LABEL: @log1p_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlog1pf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlog1pf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @log1p_4x(		; NOACCELERATE-LABEL: @log1p_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @log1pf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @log1pf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @logbf(float) readonly		declare float @logbf(float) readonly
define <4 x float> @logb_4x(<4 x float>* %a) {		define <4 x float> @logb_4x(<4 x float>* %a) {
; CHECK-LABEL: @logb_4x(		; CHECK-LABEL: @logb_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogbf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogbf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @logb_4x(		; NOACCELERATE-LABEL: @logb_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logbf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logbf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sinf(float) readonly		declare float @sinf(float) readonly
define <4 x float> @sin_4x(<4 x float>* %a) {		define <4 x float> @sin_4x(<4 x float>* %a) {
; CHECK-LABEL: @sin_4x(		; CHECK-LABEL: @sin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sin_4x(		; NOACCELERATE-LABEL: @sin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @sinf(float %vecext)		%1 = tail call fast float @sinf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @sinf(float %vecext.1)		%2 = tail call fast float @sinf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @sinf(float %vecext.2)		%3 = tail call fast float @sinf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @sinf(float %vecext.3)		%4 = tail call fast float @sinf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @cosf(float) readonly		declare float @cosf(float) readonly
define <4 x float> @cos_4x(<4 x float>* %a) {		define <4 x float> @cos_4x(<4 x float>* %a) {
; CHECK-LABEL: @cos_4x(		; CHECK-LABEL: @cos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @cos_4x(		; NOACCELERATE-LABEL: @cos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @cosf(float %vecext)		%1 = tail call fast float @cosf(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @cosf(float %vecext.1)		%2 = tail call fast float @cosf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @cosf(float %vecext.2)		%3 = tail call fast float @cosf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @cosf(float %vecext.3)		%4 = tail call fast float @cosf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @tanf(float) readonly		declare float @tanf(float) readonly
define <4 x float> @tan_4x(<4 x float>* %a) {		define <4 x float> @tan_4x(<4 x float>* %a) {
; CHECK-LABEL: @tan_4x(		; CHECK-LABEL: @tan_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @tan_4x(		; NOACCELERATE-LABEL: @tan_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @asinf(float) readonly		declare float @asinf(float) readonly
define <4 x float> @asin_4x(<4 x float>* %a) {		define <4 x float> @asin_4x(<4 x float>* %a) {
; CHECK-LABEL: @asin_4x(		; CHECK-LABEL: @asin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @asin_4x(		; NOACCELERATE-LABEL: @asin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @acosf(float) readonly		declare float @acosf(float) readonly
define <4 x float> @acos_4x(<4 x float>* %a) {		define <4 x float> @acos_4x(<4 x float>* %a) {
; CHECK-LABEL: @acos_4x(		; CHECK-LABEL: @acos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @acos_4x(		; NOACCELERATE-LABEL: @acos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @atanf(float) readonly		declare float @atanf(float) readonly
define <4 x float> @atan_4x(<4 x float>* %a) {		define <4 x float> @atan_4x(<4 x float>* %a) {
; CHECK-LABEL: @atan_4x(		; CHECK-LABEL: @atan_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @atan_4x(		; NOACCELERATE-LABEL: @atan_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sinhf(float) readonly		declare float @sinhf(float) readonly
define <4 x float> @sinh_4x(<4 x float>* %a) {		define <4 x float> @sinh_4x(<4 x float>* %a) {
; CHECK-LABEL: @sinh_4x(		; CHECK-LABEL: @sinh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sinh_4x(		; NOACCELERATE-LABEL: @sinh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @coshf(float) readonly		declare float @coshf(float) readonly
define <4 x float> @cosh_4x(<4 x float>* %a) {		define <4 x float> @cosh_4x(<4 x float>* %a) {
; CHECK-LABEL: @cosh_4x(		; CHECK-LABEL: @cosh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcoshf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcoshf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @cosh_4x(		; NOACCELERATE-LABEL: @cosh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @coshf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @coshf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @tanhf(float) readonly		declare float @tanhf(float) readonly
define <4 x float> @tanh_4x(<4 x float>* %a) {		define <4 x float> @tanh_4x(<4 x float>* %a) {
; CHECK-LABEL: @tanh_4x(		; CHECK-LABEL: @tanh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @tanh_4x(		; NOACCELERATE-LABEL: @tanh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @asinhf(float) readonly		declare float @asinhf(float) readonly
define <4 x float> @asinh_4x(<4 x float>* %a) {		define <4 x float> @asinh_4x(<4 x float>* %a) {
; CHECK-LABEL: @asinh_4x(		; CHECK-LABEL: @asinh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @asinh_4x(		; NOACCELERATE-LABEL: @asinh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @acoshf(float) readonly		declare float @acoshf(float) readonly
define <4 x float> @acosh_4x(<4 x float>* %a) {		define <4 x float> @acosh_4x(<4 x float>* %a) {
; CHECK-LABEL: @acosh_4x(		; CHECK-LABEL: @acosh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacoshf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacoshf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @acosh_4x(		; NOACCELERATE-LABEL: @acosh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @atanhf(float) readonly		declare float @atanhf(float) readonly
define <4 x float> @atanh_4x(<4 x float>* %a) {		define <4 x float> @atanh_4x(<4 x float>* %a) {
; CHECK-LABEL: @atanh_4x(		; CHECK-LABEL: @atanh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @atanh_4x(		; NOACCELERATE-LABEL: @atanh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 25 Lines
}		}

; Accelerate does not provide sin() for <2 x float>.		; Accelerate does not provide sin() for <2 x float>.
define <2 x float> @sin_2x(<2 x float>* %a) {		define <2 x float> @sin_2x(<2 x float>* %a) {
; CHECK-LABEL: @sin_2x(		; CHECK-LABEL: @sin_2x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; CHECK-NEXT: [[TMP1:%.]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) [[ATTR2:#.]]		; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) #[[ATTR2:[0-9]+]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0		; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1		; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) [[ATTR2]]		; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) #[[ATTR2]]
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1		; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
; CHECK-NEXT: ret <2 x float> [[VECINS_1]]		; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
;		;
; NOACCELERATE-LABEL: @sin_2x(		; NOACCELERATE-LABEL: @sin_2x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
Show All 18 Lines
declare float @llvm.cos.f32(float)		declare float @llvm.cos.f32(float)

; Accelerate provides cos() for <4 x float>		; Accelerate provides cos() for <4 x float>
define <4 x float> @int_cos_4x(<4 x float>* %a) {		define <4 x float> @int_cos_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_cos_4x(		; CHECK-LABEL: @int_cos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_cos_4x(		; NOACCELERATE-LABEL: @int_cos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.cos.f32(float %vecext)		%1 = tail call fast float @llvm.cos.f32(float %vecext)
%vecins = insertelement <4 x float> poison, float %1, i32 0		%vecins = insertelement <4 x float> poison, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.cos.f32(float %vecext.1)		%2 = tail call fast float @llvm.cos.f32(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @llvm.cos.f32(float %vecext.2)		%3 = tail call fast float @llvm.cos.f32(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @llvm.cos.f32(float %vecext.3)		%4 = tail call fast float @llvm.cos.f32(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}

; Accelerate does not provide cos() for <2 x float>.		; Accelerate does not provide cos() for <2 x float>.
define <2 x float> @cos_2x(<2 x float>* %a) {		define <2 x float> @cos_2x(<2 x float>* %a) {
; CHECK-LABEL: @cos_2x(		; CHECK-LABEL: @cos_2x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; CHECK-NEXT: [[TMP1:%.]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) [[ATTR3:#.]]		; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) #[[ATTR3:[0-9]+]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0		; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1		; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) [[ATTR3]]		; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) #[[ATTR3]]
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1		; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
; CHECK-NEXT: ret <2 x float> [[VECINS_1]]		; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
;		;
; NOACCELERATE-LABEL: @cos_2x(		; NOACCELERATE-LABEL: @cos_2x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -inject-tli-mappings -slp-vectorizer -vector-library=Accelerate -S %s \| FileCheck %s		; RUN: opt -inject-tli-mappings -slp-vectorizer -vector-library=Accelerate -S %s \| FileCheck %s
; RUN: opt -inject-tli-mappings -slp-vectorizer -S %s \| FileCheck --check-prefix NOACCELERATE %s		; RUN: opt -inject-tli-mappings -slp-vectorizer -S %s \| FileCheck --check-prefix NOACCELERATE %s

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios14.0.0"		target triple = "arm64-apple-ios14.0.0"

declare float @llvm.sin.f32(float)		declare float @llvm.sin.f32(float)

; Accelerate provides sin() for <4 x float>		; Accelerate provides sin() for <4 x float>
define <4 x float> @int_sin_4x(<4 x float>* %a) {		define <4 x float> @int_sin_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_sin_4x(		; CHECK-LABEL: @int_sin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_sin_4x(		; NOACCELERATE-LABEL: @int_sin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.sin.f32(float %vecext)		%1 = tail call fast float @llvm.sin.f32(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.sin.f32(float %vecext.1)		%2 = tail call fast float @llvm.sin.f32(float %vecext.1)
Show All 9 Lines

declare float @ceilf(float) readonly		declare float @ceilf(float) readonly

define <4 x float> @ceil_4x(<4 x float>* %a) {		define <4 x float> @ceil_4x(<4 x float>* %a) {
; CHECK-LABEL: @ceil_4x(		; CHECK-LABEL: @ceil_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @ceil_4x(		; NOACCELERATE-LABEL: @ceil_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @ceilf(float %vecext)		%1 = tail call fast float @ceilf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @ceilf(float %vecext.1)		%2 = tail call fast float @ceilf(float %vecext.1)
Show All 9 Lines

declare float @fabsf(float) readonly		declare float @fabsf(float) readonly

define <4 x float> @fabs_4x(<4 x float>* %a) {		define <4 x float> @fabs_4x(<4 x float>* %a) {
; CHECK-LABEL: @fabs_4x(		; CHECK-LABEL: @fabs_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @fabs_4x(		; NOACCELERATE-LABEL: @fabs_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @fabsf(float %vecext)		%1 = tail call fast float @fabsf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @fabsf(float %vecext.1)		%2 = tail call fast float @fabsf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @fabsf(float %vecext.2)		%3 = tail call fast float @fabsf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @fabsf(float %vecext.3)		%4 = tail call fast float @fabsf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @llvm.fabs.f32(float)		declare float @llvm.fabs.f32(float)
define <4 x float> @int_fabs_4x(<4 x float>* %a) {		define <4 x float> @int_fabs_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_fabs_4x(		; CHECK-LABEL: @int_fabs_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_fabs_4x(		; NOACCELERATE-LABEL: @int_fabs_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.fabs.f32(float %vecext)		%1 = tail call fast float @llvm.fabs.f32(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.fabs.f32(float %vecext.1)		%2 = tail call fast float @llvm.fabs.f32(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @llvm.fabs.f32(float %vecext.2)		%3 = tail call fast float @llvm.fabs.f32(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @llvm.fabs.f32(float %vecext.3)		%4 = tail call fast float @llvm.fabs.f32(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @floorf(float) readonly		declare float @floorf(float) readonly
define <4 x float> @floor_4x(<4 x float>* %a) {		define <4 x float> @floor_4x(<4 x float>* %a) {
; CHECK-LABEL: @floor_4x(		; CHECK-LABEL: @floor_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @floor_4x(		; NOACCELERATE-LABEL: @floor_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @floorf(float %vecext)		%1 = tail call fast float @floorf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @floorf(float %vecext.1)		%2 = tail call fast float @floorf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @floorf(float %vecext.2)		%3 = tail call fast float @floorf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @floorf(float %vecext.3)		%4 = tail call fast float @floorf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sqrtf(float) readonly		declare float @sqrtf(float) readonly
define <4 x float> @sqrt_4x(<4 x float>* %a) {		define <4 x float> @sqrt_4x(<4 x float>* %a) {
; CHECK-LABEL: @sqrt_4x(		; CHECK-LABEL: @sqrt_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sqrt_4x(		; NOACCELERATE-LABEL: @sqrt_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP0]])
; NOACCELERATE-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; NOACCELERATE-NEXT: ret <4 x float> [[TMP1]]
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; NOACCELERATE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; NOACCELERATE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; NOACCELERATE-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @sqrtf(float %vecext)		%1 = tail call fast float @sqrtf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @sqrtf(float %vecext.1)		%2 = tail call fast float @sqrtf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @sqrtf(float %vecext.2)		%3 = tail call fast float @sqrtf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @sqrtf(float %vecext.3)		%4 = tail call fast float @sqrtf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @expf(float) readonly		declare float @expf(float) readonly
define <4 x float> @exp_4x(<4 x float>* %a) {		define <4 x float> @exp_4x(<4 x float>* %a) {
; CHECK-LABEL: @exp_4x(		; CHECK-LABEL: @exp_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @exp_4x(		; NOACCELERATE-LABEL: @exp_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
		RKSimonUnsubmitted Done Reply Inline Actions I'm curious why we ended up with a partial vectorization of 2 x expf + llvm.exp.v2f32 here instead of llvm.exp.v4f32 RKSimon: I'm curious why we ended up with a partial vectorization of 2 x expf + llvm.exp.v2f32 here…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions I've debugged this case. Without acceleration `@llvm.exp.v4f32` is too expensive (call cost is 18=58-40 vs -30 = 10-40 with accelaration), whereas `@llvm.exp.v2f32` is cheaper (call cost is 6=26-20). anton-afanasyev: I've debugged this case. Without acceleration `@llvm.exp.v4f32` is too expensive (call cost is…
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @expf(float %vecext)		%1 = tail call fast float @expf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @expf(float %vecext.1)		%2 = tail call fast float @expf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @expf(float %vecext.2)		%3 = tail call fast float @expf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @expf(float %vecext.3)		%4 = tail call fast float @expf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @expm1f(float) readonly		declare float @expm1f(float) readonly
define <4 x float> @expm1_4x(<4 x float>* %a) {		define <4 x float> @expm1_4x(<4 x float>* %a) {
; CHECK-LABEL: @expm1_4x(		; CHECK-LABEL: @expm1_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpm1f(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vexpm1f(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @expm1_4x(		; NOACCELERATE-LABEL: @expm1_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expm1f(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expm1f(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @logf(float) readonly		declare float @logf(float) readonly
define <4 x float> @log_4x(<4 x float>* %a) {		define <4 x float> @log_4x(<4 x float>* %a) {
; CHECK-LABEL: @log_4x(		; CHECK-LABEL: @log_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @log_4x(		; NOACCELERATE-LABEL: @log_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
		RKSimonUnsubmitted Done Reply Inline Actions Why not llvm.log.v4f32? RKSimon: Why not llvm.log.v4f32?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions This case is the same as above. anton-afanasyev: This case is the same as above.
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @logf(float %vecext)		%1 = tail call fast float @logf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @logf(float %vecext.1)		%2 = tail call fast float @logf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @logf(float %vecext.2)		%3 = tail call fast float @logf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @logf(float %vecext.3)		%4 = tail call fast float @logf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @log1pf(float) readonly		declare float @log1pf(float) readonly
define <4 x float> @log1p_4x(<4 x float>* %a) {		define <4 x float> @log1p_4x(<4 x float>* %a) {
; CHECK-LABEL: @log1p_4x(		; CHECK-LABEL: @log1p_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlog1pf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlog1pf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @log1p_4x(		; NOACCELERATE-LABEL: @log1p_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @log1pf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @log1pf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @logbf(float) readonly		declare float @logbf(float) readonly
define <4 x float> @logb_4x(<4 x float>* %a) {		define <4 x float> @logb_4x(<4 x float>* %a) {
; CHECK-LABEL: @logb_4x(		; CHECK-LABEL: @logb_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogbf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vlogbf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @logb_4x(		; NOACCELERATE-LABEL: @logb_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logbf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logbf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sinf(float) readonly		declare float @sinf(float) readonly
define <4 x float> @sin_4x(<4 x float>* %a) {		define <4 x float> @sin_4x(<4 x float>* %a) {
; CHECK-LABEL: @sin_4x(		; CHECK-LABEL: @sin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sin_4x(		; NOACCELERATE-LABEL: @sin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @sinf(float %vecext)		%1 = tail call fast float @sinf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @sinf(float %vecext.1)		%2 = tail call fast float @sinf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @sinf(float %vecext.2)		%3 = tail call fast float @sinf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @sinf(float %vecext.3)		%4 = tail call fast float @sinf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @cosf(float) readonly		declare float @cosf(float) readonly
define <4 x float> @cos_4x(<4 x float>* %a) {		define <4 x float> @cos_4x(<4 x float>* %a) {
; CHECK-LABEL: @cos_4x(		; CHECK-LABEL: @cos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @cos_4x(		; NOACCELERATE-LABEL: @cos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @cosf(float %vecext)		%1 = tail call fast float @cosf(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @cosf(float %vecext.1)		%2 = tail call fast float @cosf(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @cosf(float %vecext.2)		%3 = tail call fast float @cosf(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @cosf(float %vecext.3)		%4 = tail call fast float @cosf(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @tanf(float) readonly		declare float @tanf(float) readonly
define <4 x float> @tan_4x(<4 x float>* %a) {		define <4 x float> @tan_4x(<4 x float>* %a) {
; CHECK-LABEL: @tan_4x(		; CHECK-LABEL: @tan_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @tan_4x(		; NOACCELERATE-LABEL: @tan_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @asinf(float) readonly		declare float @asinf(float) readonly
define <4 x float> @asin_4x(<4 x float>* %a) {		define <4 x float> @asin_4x(<4 x float>* %a) {
; CHECK-LABEL: @asin_4x(		; CHECK-LABEL: @asin_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @asin_4x(		; NOACCELERATE-LABEL: @asin_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @acosf(float) readonly		declare float @acosf(float) readonly
define <4 x float> @acos_4x(<4 x float>* %a) {		define <4 x float> @acos_4x(<4 x float>* %a) {
; CHECK-LABEL: @acos_4x(		; CHECK-LABEL: @acos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @acos_4x(		; NOACCELERATE-LABEL: @acos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @atanf(float) readonly		declare float @atanf(float) readonly
define <4 x float> @atan_4x(<4 x float>* %a) {		define <4 x float> @atan_4x(<4 x float>* %a) {
; CHECK-LABEL: @atan_4x(		; CHECK-LABEL: @atan_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @atan_4x(		; NOACCELERATE-LABEL: @atan_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @sinhf(float) readonly		declare float @sinhf(float) readonly
define <4 x float> @sinh_4x(<4 x float>* %a) {		define <4 x float> @sinh_4x(<4 x float>* %a) {
; CHECK-LABEL: @sinh_4x(		; CHECK-LABEL: @sinh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vsinhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @sinh_4x(		; NOACCELERATE-LABEL: @sinh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @coshf(float) readonly		declare float @coshf(float) readonly
define <4 x float> @cosh_4x(<4 x float>* %a) {		define <4 x float> @cosh_4x(<4 x float>* %a) {
; CHECK-LABEL: @cosh_4x(		; CHECK-LABEL: @cosh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcoshf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcoshf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @cosh_4x(		; NOACCELERATE-LABEL: @cosh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @coshf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @coshf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @tanhf(float) readonly		declare float @tanhf(float) readonly
define <4 x float> @tanh_4x(<4 x float>* %a) {		define <4 x float> @tanh_4x(<4 x float>* %a) {
; CHECK-LABEL: @tanh_4x(		; CHECK-LABEL: @tanh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vtanhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @tanh_4x(		; NOACCELERATE-LABEL: @tanh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @asinhf(float) readonly		declare float @asinhf(float) readonly
define <4 x float> @asinh_4x(<4 x float>* %a) {		define <4 x float> @asinh_4x(<4 x float>* %a) {
; CHECK-LABEL: @asinh_4x(		; CHECK-LABEL: @asinh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vasinhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @asinh_4x(		; NOACCELERATE-LABEL: @asinh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @acoshf(float) readonly		declare float @acoshf(float) readonly
define <4 x float> @acosh_4x(<4 x float>* %a) {		define <4 x float> @acosh_4x(<4 x float>* %a) {
; CHECK-LABEL: @acosh_4x(		; CHECK-LABEL: @acosh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacoshf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vacoshf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @acosh_4x(		; NOACCELERATE-LABEL: @acosh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 24 Lines	entry:
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}
declare float @atanhf(float) readonly		declare float @atanhf(float) readonly
define <4 x float> @atanh_4x(<4 x float>* %a) {		define <4 x float> @atanh_4x(<4 x float>* %a) {
; CHECK-LABEL: @atanh_4x(		; CHECK-LABEL: @atanh_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanhf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vatanhf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @atanh_4x(		; NOACCELERATE-LABEL: @atanh_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanhf(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @atanhf(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
Show All 25 Lines
}		}

; Accelerate does not provide sin() for <2 x float>.		; Accelerate does not provide sin() for <2 x float>.
define <2 x float> @sin_2x(<2 x float>* %a) {		define <2 x float> @sin_2x(<2 x float>* %a) {
; CHECK-LABEL: @sin_2x(		; CHECK-LABEL: @sin_2x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) #2		; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]]) #[[ATTR2:[0-9]+]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0		; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1		; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) #2		; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]]) #[[ATTR2]]
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1		; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
; CHECK-NEXT: ret <2 x float> [[VECINS_1]]		; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
;		;
; NOACCELERATE-LABEL: @sin_2x(		; NOACCELERATE-LABEL: @sin_2x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
Show All 18 Lines
declare float @llvm.cos.f32(float)		declare float @llvm.cos.f32(float)

; Accelerate provides cos() for <4 x float>		; Accelerate provides cos() for <4 x float>
define <4 x float> @int_cos_4x(<4 x float>* %a) {		define <4 x float> @int_cos_4x(<4 x float>* %a) {
; CHECK-LABEL: @int_cos_4x(		; CHECK-LABEL: @int_cos_4x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])		; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vcosf(<4 x float> [[TMP0]])
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
;		;
; NOACCELERATE-LABEL: @int_cos_4x(		; NOACCELERATE-LABEL: @int_cos_4x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0		; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1		; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])		; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1		; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2		; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
; NOACCELERATE-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]])
; NOACCELERATE-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3		; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
; NOACCELERATE-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])		; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3		; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]		; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
		; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
		; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
		; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
;		;
entry:		entry:
%0 = load <4 x float>, <4 x float>* %a, align 16		%0 = load <4 x float>, <4 x float>* %a, align 16
%vecext = extractelement <4 x float> %0, i32 0		%vecext = extractelement <4 x float> %0, i32 0
%1 = tail call fast float @llvm.cos.f32(float %vecext)		%1 = tail call fast float @llvm.cos.f32(float %vecext)
%vecins = insertelement <4 x float> undef, float %1, i32 0		%vecins = insertelement <4 x float> undef, float %1, i32 0
%vecext.1 = extractelement <4 x float> %0, i32 1		%vecext.1 = extractelement <4 x float> %0, i32 1
%2 = tail call fast float @llvm.cos.f32(float %vecext.1)		%2 = tail call fast float @llvm.cos.f32(float %vecext.1)
%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1		%vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1
%vecext.2 = extractelement <4 x float> %0, i32 2		%vecext.2 = extractelement <4 x float> %0, i32 2
%3 = tail call fast float @llvm.cos.f32(float %vecext.2)		%3 = tail call fast float @llvm.cos.f32(float %vecext.2)
%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2		%vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2
%vecext.3 = extractelement <4 x float> %0, i32 3		%vecext.3 = extractelement <4 x float> %0, i32 3
%4 = tail call fast float @llvm.cos.f32(float %vecext.3)		%4 = tail call fast float @llvm.cos.f32(float %vecext.3)
%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3		%vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3
ret <4 x float> %vecins.3		ret <4 x float> %vecins.3
}		}

; Accelerate does not provide cos() for <2 x float>.		; Accelerate does not provide cos() for <2 x float>.
define <2 x float> @cos_2x(<2 x float>* %a) {		define <2 x float> @cos_2x(<2 x float>* %a) {
; CHECK-LABEL: @cos_2x(		; CHECK-LABEL: @cos_2x(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) #3		; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) #[[ATTR3:[0-9]+]]
; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0		; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1		; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) #3		; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) #[[ATTR3]]
; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1		; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x float> [[VECINS]], float [[TMP2]], i32 1
; CHECK-NEXT: ret <2 x float> [[VECINS_1]]		; CHECK-NEXT: ret <2 x float> [[VECINS_1]]
;		;
; NOACCELERATE-LABEL: @cos_2x(		; NOACCELERATE-LABEL: @cos_2x(
; NOACCELERATE-NEXT: entry:		; NOACCELERATE-NEXT: entry:
; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16		; NOACCELERATE-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> [[A:%.*]], align 16
; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0		; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])		; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	Show All 23 Lines
	; GATHER-LABEL: @PR28330(			; GATHER-LABEL: @PR28330(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7			; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7
	; GATHER-NEXT: [[TMP3:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0			; GATHER-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i1> poison, i1 [[TMP3]], i32 0			; GATHER-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6
	; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1			; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5
	; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i1> [[TMP4]], i1 [[TMP5]], i32 1			; GATHER-NEXT: [[TMP6:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4
	; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3
	; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i1> [[TMP6]], i1 [[TMP7]], i32 2			; GATHER-NEXT: [[TMP8:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2
	; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3			; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1
	; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i1> [[TMP8]], i1 [[TMP9]], i32 3			; GATHER-NEXT: [[TMP10:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0
	; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4			; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i1> [[TMP10]], i1 [[TMP11]], i32 4			; GATHER-NEXT: [[TMP12:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5			; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; GATHER-NEXT: [[TMP14:%.*]] = insertelement <8 x i1> [[TMP12]], i1 [[TMP13]], i32 5			; GATHER-NEXT: [[TMP14:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6			; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; GATHER-NEXT: [[TMP16:%.*]] = insertelement <8 x i1> [[TMP14]], i1 [[TMP15]], i32 6			; GATHER-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP16]], i1 [[TMP2]], i32 7			; GATHER-NEXT: [[TMP17:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP3]])
	; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP18]], [[P17]]
	; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1			; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2
	; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3
	; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4
	; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5
	; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6
	; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> poison, i32 [[TMP19]], i32 0
	; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1
	; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2
	; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3
	; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4
	; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5
	; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6
	; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7
	; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], [[P17]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR28330(			; MAX-COST-LABEL: @PR28330(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0			; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0
	; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0			; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0
	▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; GATHER-LABEL: @PR32038(			; GATHER-LABEL: @PR32038(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7			; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7
	; GATHER-NEXT: [[TMP3:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0			; GATHER-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i1> poison, i1 [[TMP3]], i32 0			; GATHER-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6
	; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1			; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5
	; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i1> [[TMP4]], i1 [[TMP5]], i32 1			; GATHER-NEXT: [[TMP6:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4
	; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3
	; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i1> [[TMP6]], i1 [[TMP7]], i32 2			; GATHER-NEXT: [[TMP8:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2
	; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3			; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1
	; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i1> [[TMP8]], i1 [[TMP9]], i32 3			; GATHER-NEXT: [[TMP10:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0
	; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4			; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i1> [[TMP10]], i1 [[TMP11]], i32 4			; GATHER-NEXT: [[TMP12:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5			; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; GATHER-NEXT: [[TMP14:%.*]] = insertelement <8 x i1> [[TMP12]], i1 [[TMP13]], i32 5			; GATHER-NEXT: [[TMP14:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6			; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; GATHER-NEXT: [[TMP16:%.*]] = insertelement <8 x i1> [[TMP14]], i1 [[TMP15]], i32 6			; GATHER-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP16]], i1 [[TMP2]], i32 7			; GATHER-NEXT: [[TMP17:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP3]])
	; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP18]], -5
	; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1			; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2
	; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3
	; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4
	; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5
	; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6
	; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> poison, i32 [[TMP19]], i32 0
	; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1
	; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2
	; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3
	; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4
	; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5
	; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6
	; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7
	; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], -5
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0			; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0
	; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4			; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0			; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0			; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2			; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0			; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0			; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0			; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0			; MAX-COST-NEXT: [[TMP2:%.*]] = shufflevector <2 x i1> [[TMP1]], <2 x i1> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> poison, i1 [[TMP2]], i32 0			; MAX-COST-NEXT: [[TMP3:%.*]] = shufflevector <4 x i1> poison, <4 x i1> [[TMP2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1			; MAX-COST-NEXT: [[TMP4:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[P5]], i32 2
	; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[TMP4]], i32 1			; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP4]], i1 [[P7]], i32 3
	; MAX-COST-NEXT: [[TMP6:%.*]] = insertelement <4 x i1> [[TMP5]], i1 [[P5]], i32 2			; MAX-COST-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[TMP7:%.*]] = insertelement <4 x i1> [[TMP6]], i1 [[P7]], i32 3			; MAX-COST-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
	; MAX-COST-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>			; MAX-COST-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP6]])
	; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[P27]]			; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[P27]]
	; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[P29]]			; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[P29]]
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP11]], -5			; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP11]], -5
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]			; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]			; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	Show All 40 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define <2 x float> @insertelement-fixed-vector() {			define <2 x float> @insertelement-fixed-vector() {
	; CHECK-LABEL: @insertelement-fixed-vector(			; CHECK-LABEL: @insertelement-fixed-vector(
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)			; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: ret <2 x float> [[TMP1]]
	; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: ret <2 x float> [[I1]]
	;			;
	%f0 = tail call fast float @llvm.fabs.f32(float undef)			%f0 = tail call fast float @llvm.fabs.f32(float undef)
	%f1 = tail call fast float @llvm.fabs.f32(float undef)			%f1 = tail call fast float @llvm.fabs.f32(float undef)
	%i0 = insertelement <2 x float> poison, float %f0, i32 0			%i0 = insertelement <2 x float> poison, float %f0, i32 0
	%i1 = insertelement <2 x float> %i0, float %f1, i32 1			%i1 = insertelement <2 x float> %i0, float %f1, i32 1
	ret <2 x float> %i1			ret <2 x float> %i1
	}			}

	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define <2 x float> @insertelement-fixed-vector() {			define <2 x float> @insertelement-fixed-vector() {
	; CHECK-LABEL: @insertelement-fixed-vector(			; CHECK-LABEL: @insertelement-fixed-vector(
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)			; CHECK-NEXT: [[TMP1:%.*]] = call fast <2 x float> @llvm.fabs.v2f32(<2 x float> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: ret <2 x float> [[TMP1]]
	; CHECK-NEXT: [[I0:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertelement <2 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: ret <2 x float> [[I1]]
	;			;
	%f0 = tail call fast float @llvm.fabs.f32(float undef)			%f0 = tail call fast float @llvm.fabs.f32(float undef)
	%f1 = tail call fast float @llvm.fabs.f32(float undef)			%f1 = tail call fast float @llvm.fabs.f32(float undef)
	%i0 = insertelement <2 x float> undef, float %f0, i32 0			%i0 = insertelement <2 x float> undef, float %f0, i32 0
	%i1 = insertelement <2 x float> %i0, float %f1, i32 1			%i1 = insertelement <2 x float> %i0, float %f1, i32 1
	ret <2 x float> %i1			ret <2 x float> %i1
	}			}

	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]
	; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]			; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp1.0 = sub i32 %v0.0, %v1.0			%tmp1.0 = sub i32 %v0.0, %v1.0
	Show All 11 Lines
	; CHECK-LABEL: @build_vec_v4i32_reuse_1(			; CHECK-LABEL: @build_vec_v4i32_reuse_1(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1_0]], i32 0			; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp0.2 = xor i32 %v0.0, %v1.0			%tmp0.2 = xor i32 %v0.0, %v1.0
	Show All 21 Lines
	; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]			; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]
	; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
	; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]			; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
	; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0			; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0
	; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1			; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
	; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]			; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp0.2 = xor i32 %v0.0, %v1.0			%tmp0.2 = xor i32 %v0.0, %v1.0
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]
	; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]			; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp1.0 = sub i32 %v0.0, %v1.0			%tmp1.0 = sub i32 %v0.0, %v1.0
	Show All 11 Lines
	; CHECK-LABEL: @build_vec_v4i32_reuse_1(			; CHECK-LABEL: @build_vec_v4i32_reuse_1(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i32 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP1_0:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i32 0
	; CHECK-NEXT: [[TMP1_1:%.*]] = sub i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]
	; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1_0]], i32 0			; CHECK-NEXT: [[TMP2_11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP1_1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_3:%.*]] = shufflevector <4 x i32> [[TMP2_1]], <4 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_32:%.*]] = shufflevector <4 x i32> [[TMP2_11]], <4 x i32> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_32]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp0.2 = xor i32 %v0.0, %v1.0			%tmp0.2 = xor i32 %v0.0, %v1.0
	Show All 21 Lines
	; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]			; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]
	; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]			; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
	; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]			; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
	; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0			; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0
	; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1			; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
	; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]			; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	%tmp0.2 = xor i32 %v0.0, %v1.0			%tmp0.2 = xor i32 %v0.0, %v1.0
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S %s \| FileCheck %s

	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "arm64-apple-darwin"			target triple = "arm64-apple-darwin"

	declare void @use(double)			declare void @use(double)

	; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,			; The extracts %v1.lane.0 and %v1.lane.1 should be considered free during SLP,
	; because they will be directly in a vector register on AArch64.			; because they will be directly in a vector register on AArch64.
	define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @noop_extracts_first_2_lanes(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @noop_extracts_first_2_lanes(			; CHECK-LABEL: @noop_extracts_first_2_lanes(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[V2_LANE_3:%.*]] = extractelement <4 x double> [[V_2]], i32 3			; CHECK-NEXT: [[V2_LANE_3:%.*]] = extractelement <4 x double> [[V_2]], i32 3
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_3]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_3]], i32 1
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <2 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <2 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: store <2 x double> [[A_INS_1]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: call void @use(double [[TMP4]])
				; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 18 Lines
	; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(			; CHECK-LABEL: @extracts_first_2_lanes_different_vectors(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8			; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8
	; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1			; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V3_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V3_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <2 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <2 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
				; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V3_LANE_1]])			; CHECK-NEXT: call void @use(double [[V3_LANE_1]])
	; CHECK-NEXT: store <2 x double> [[A_INS_1]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8			%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8
	%v3.lane.1 = extractelement <2 x double> %v.3, i32 1			%v3.lane.1 = extractelement <2 x double> %v.3, i32 1

	Show All 23 Lines
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <4 x double> undef, double [[TMP5]], i32 0			; CHECK-NEXT: [[A_INS_11:%.*]] = shufflevector <4 x double> undef, <4 x double> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <4 x double> [[A_INS_0]], double [[TMP6]], i32 1
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <4 x double> [[A_INS_1]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[A_INS_11]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <4 x double> %v.1, i32 3			%v1.lane.3 = extractelement <4 x double> %v.1, i32 3

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 12 Lines
	}			}

	; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be			; %v1.lane.0 and %v1.lane.1 are used in reverse-order, so they won't be
	; directly in a vector register on AArch64.			; directly in a vector register on AArch64.
	define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_reverse_order(			; CHECK-LABEL: @extract_reverse_order(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[V_1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <2 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP1]]
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <2 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 1
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[TMP3]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 0
	; CHECK-NEXT: store <2 x double> [[A_INS_1]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: call void @use(double [[TMP4]])
				; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 16 Lines
	define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_lanes_1_and_2(<4 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_lanes_1_and_2(			; CHECK-LABEL: @extract_lanes_1_and_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_1]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <4 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <4 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1
				; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; CHECK-NEXT: [[A_INS_11:%.*]] = shufflevector <4 x double> undef, <4 x double> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: store <4 x double> [[A_INS_1]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[A_INS_11]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.1 = extractelement <4 x double> %v.1, i32 1			%v1.lane.1 = extractelement <4 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 22 Lines
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_3]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_0]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_0]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1			; CHECK-NEXT: [[A_INS_31:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP7]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[TMP6]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_3]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[A_INS_31]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x double> [[TMP10]], double [[V2_LANE_0]], i32 3			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x double> [[TMP10]], double [[V2_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x double> [[TMP11]], double [[V2_LANE_2]], i32 4			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x double> [[TMP11]], double [[V2_LANE_2]], i32 4
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> [[TMP12]], double [[V2_LANE_0]], i32 5			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> [[TMP12]], double [[V2_LANE_0]], i32 5
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V2_LANE_2]], i32 6			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V2_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V2_LANE_1]], i32 7			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V2_LANE_1]], i32 7
	; CHECK-NEXT: [[TMP16:%.*]] = fmul <8 x double> [[TMP7]], [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = fmul <8 x double> [[TMP7]], [[TMP15]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x double> [[TMP16]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <8 x double> [[TMP16]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[TMP17]], i32 0			; CHECK-NEXT: [[A_INS_72:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP17]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x double> [[TMP16]], i32 1			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_72]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[TMP18]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x double> [[TMP16]], i32 2			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_7]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[TMP19]], i32 2			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x double> [[TMP16]], i32 3			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> [[TMP20]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[TMP20]], i32 3			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <8 x double> [[TMP16]], i32 4			; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[A_INS_4:%.*]] = insertelement <9 x double> [[A_INS_3]], double [[TMP21]], i32 4			; CHECK-NEXT: [[TMP24:%.*]] = insertelement <8 x double> [[TMP23]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x double> [[TMP16]], i32 5			; CHECK-NEXT: [[TMP25:%.*]] = insertelement <8 x double> [[TMP24]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[A_INS_5:%.*]] = insertelement <9 x double> [[A_INS_4]], double [[TMP22]], i32 5			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <8 x double> [[TMP16]], i32 6			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <8 x double> [[TMP26]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_6:%.*]] = insertelement <9 x double> [[A_INS_5]], double [[TMP23]], i32 6			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <8 x double> [[TMP27]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x double> [[TMP16]], i32 7			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <8 x double> [[TMP28]], double [[V2_LANE_2]], i32 3
	; CHECK-NEXT: [[A_INS_7:%.*]] = insertelement <9 x double> [[A_INS_6]], double [[TMP24]], i32 7			; CHECK-NEXT: [[TMP30:%.*]] = insertelement <8 x double> [[TMP29]], double [[V2_LANE_1]], i32 4
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_7]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <8 x double> [[TMP30]], double [[V2_LANE_0]], i32 5
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0			; CHECK-NEXT: [[TMP32:%.*]] = insertelement <8 x double> [[TMP31]], double [[V2_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <8 x double> [[TMP25]], double [[V1_LANE_7]], i32 1			; CHECK-NEXT: [[TMP33:%.*]] = insertelement <8 x double> [[TMP32]], double [[V2_LANE_1]], i32 7
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <8 x double> [[TMP26]], double [[V1_LANE_8]], i32 2			; CHECK-NEXT: [[TMP34:%.*]] = fmul <8 x double> [[TMP25]], [[TMP33]]
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <8 x double> [[TMP27]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <8 x double> [[TMP28]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP30:%.*]] = insertelement <8 x double> [[TMP29]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <8 x double> [[TMP30]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <8 x double> [[TMP31]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <8 x double> [[TMP33]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <8 x double> [[TMP34]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[TMP36:%.*]] = insertelement <8 x double> [[TMP35]], double [[V2_LANE_2]], i32 3
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <8 x double> [[TMP36]], double [[V2_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP38:%.*]] = insertelement <8 x double> [[TMP37]], double [[V2_LANE_0]], i32 5
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <8 x double> [[TMP38]], double [[V2_LANE_2]], i32 6
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <8 x double> [[TMP39]], double [[V2_LANE_1]], i32 7
	; CHECK-NEXT: [[TMP41:%.*]] = fmul <8 x double> [[TMP32]], [[TMP40]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP42:%.*]] = extractelement <8 x double> [[TMP41]], i32 0			; CHECK-NEXT: [[TMP35:%.*]] = shufflevector <8 x double> [[TMP34]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_0:%.*]] = insertelement <9 x double> undef, double [[TMP42]], i32 0			; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP35]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[TMP43:%.*]] = extractelement <8 x double> [[TMP41]], i32 1			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[B_INS_1:%.*]] = insertelement <9 x double> [[B_INS_0]], double [[TMP43]], i32 1
	; CHECK-NEXT: [[TMP44:%.*]] = extractelement <8 x double> [[TMP41]], i32 2
	; CHECK-NEXT: [[B_INS_2:%.*]] = insertelement <9 x double> [[B_INS_1]], double [[TMP44]], i32 2
	; CHECK-NEXT: [[TMP45:%.*]] = extractelement <8 x double> [[TMP41]], i32 3
	; CHECK-NEXT: [[B_INS_3:%.*]] = insertelement <9 x double> [[B_INS_2]], double [[TMP45]], i32 3
	; CHECK-NEXT: [[TMP46:%.*]] = extractelement <8 x double> [[TMP41]], i32 4
	; CHECK-NEXT: [[B_INS_4:%.*]] = insertelement <9 x double> [[B_INS_3]], double [[TMP46]], i32 4
	; CHECK-NEXT: [[TMP47:%.*]] = extractelement <8 x double> [[TMP41]], i32 5
	; CHECK-NEXT: [[B_INS_5:%.*]] = insertelement <9 x double> [[B_INS_4]], double [[TMP47]], i32 5
	; CHECK-NEXT: [[TMP48:%.*]] = extractelement <8 x double> [[TMP41]], i32 6
	; CHECK-NEXT: [[B_INS_6:%.*]] = insertelement <9 x double> [[B_INS_5]], double [[TMP48]], i32 6
	; CHECK-NEXT: [[TMP49:%.*]] = extractelement <8 x double> [[TMP41]], i32 7
	; CHECK-NEXT: [[B_INS_7:%.*]] = insertelement <9 x double> [[B_INS_6]], double [[TMP49]], i32 7
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6			; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7			; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8			; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_6]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_6]], i32 2
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_5]], i32 3
	; CHECK-NEXT: [[A_LANE_4:%.*]] = fmul double [[V1_LANE_8]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[A_LANE_5:%.*]] = fmul double [[V1_LANE_7]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[A_LANE_6:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[A_LANE_7:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3
	; CHECK-NEXT: [[A_INS_4:%.*]] = insertelement <9 x double> [[A_INS_3]], double [[A_LANE_4]], i32 4
	; CHECK-NEXT: [[A_INS_5:%.*]] = insertelement <9 x double> [[A_INS_4]], double [[A_LANE_5]], i32 5
	; CHECK-NEXT: [[A_INS_6:%.*]] = insertelement <9 x double> [[A_INS_5]], double [[A_LANE_6]], i32 6
	; CHECK-NEXT: [[A_INS_7:%.*]] = insertelement <9 x double> [[A_INS_6]], double [[A_LANE_7]], i32 7
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_7]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_7]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_1]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_2]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_3]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_4]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_1]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_1]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_0]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_0]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_2]], i32 2			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x double> [[TMP10]], double [[V2_LANE_0]], i32 3			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x double> [[TMP10]], double [[V2_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x double> [[TMP11]], double [[V2_LANE_2]], i32 4			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x double> [[TMP11]], double [[V2_LANE_2]], i32 4
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> [[TMP12]], double [[V2_LANE_1]], i32 5			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> [[TMP12]], double [[V2_LANE_1]], i32 5
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V2_LANE_0]], i32 6			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V2_LANE_0]], i32 6
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V2_LANE_2]], i32 7			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V2_LANE_2]], i32 7
	; CHECK-NEXT: [[TMP16:%.*]] = fmul <8 x double> [[TMP7]], [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = fmul <8 x double> [[TMP7]], [[TMP15]]
				; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]
				; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <8 x double> [[TMP16]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
				; CHECK-NEXT: [[A_INS_72:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP17]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
				; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_72]], double [[A_LANE_8]], i32 8
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_6]], i32 0
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_7]], i32 1
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_8]], i32 2
				; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> [[TMP20]], double [[V1_LANE_0]], i32 3
				; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V1_LANE_1]], i32 4
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V1_LANE_2]], i32 5
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <8 x double> [[TMP23]], double [[V1_LANE_3]], i32 6
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <8 x double> [[TMP24]], double [[V1_LANE_4]], i32 7
				; CHECK-NEXT: [[TMP26:%.*]] = fmul <8 x double> [[TMP25]], [[TMP15]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x double> [[TMP16]], i32 0			; CHECK-NEXT: [[TMP27:%.*]] = shufflevector <8 x double> [[TMP26]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_0:%.*]] = insertelement <9 x double> undef, double [[TMP17]], i32 0			; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP27]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x double> [[TMP16]], i32 1			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[B_INS_1:%.*]] = insertelement <9 x double> [[B_INS_0]], double [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x double> [[TMP16]], i32 2
	; CHECK-NEXT: [[B_INS_2:%.*]] = insertelement <9 x double> [[B_INS_1]], double [[TMP19]], i32 2
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x double> [[TMP16]], i32 3
	; CHECK-NEXT: [[B_INS_3:%.*]] = insertelement <9 x double> [[B_INS_2]], double [[TMP20]], i32 3
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <8 x double> [[TMP16]], i32 4
	; CHECK-NEXT: [[B_INS_4:%.*]] = insertelement <9 x double> [[B_INS_3]], double [[TMP21]], i32 4
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x double> [[TMP16]], i32 5
	; CHECK-NEXT: [[B_INS_5:%.*]] = insertelement <9 x double> [[B_INS_4]], double [[TMP22]], i32 5
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <8 x double> [[TMP16]], i32 6
	; CHECK-NEXT: [[B_INS_6:%.*]] = insertelement <9 x double> [[B_INS_5]], double [[TMP23]], i32 6
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x double> [[TMP16]], i32 7
	; CHECK-NEXT: [[B_INS_7:%.*]] = insertelement <9 x double> [[B_INS_6]], double [[TMP24]], i32 7
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5			; CHECK-NEXT: [[V1_LANE_5:%.*]] = extractelement <9 x double> [[V_1]], i32 5
	; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6			; CHECK-NEXT: [[V1_LANE_6:%.*]] = extractelement <9 x double> [[V_1]], i32 6
	; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7			; CHECK-NEXT: [[V1_LANE_7:%.*]] = extractelement <9 x double> [[V_1]], i32 7
	; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8			; CHECK-NEXT: [[V1_LANE_8:%.*]] = extractelement <9 x double> [[V_1]], i32 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_4]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x double> [[TMP0]], double [[V1_LANE_3]], i32 1
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x double> [[TMP1]], double [[V1_LANE_5]], i32 2
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_6]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x double> [[TMP2]], double [[V1_LANE_6]], i32 3
	; CHECK-NEXT: [[A_LANE_4:%.*]] = fmul double [[V1_LANE_8]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x double> [[TMP3]], double [[V1_LANE_8]], i32 4
	; CHECK-NEXT: [[A_LANE_5:%.*]] = fmul double [[V1_LANE_7]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x double> [[TMP4]], double [[V1_LANE_7]], i32 5
	; CHECK-NEXT: [[A_LANE_6:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x double> [[TMP5]], double [[V1_LANE_1]], i32 6
	; CHECK-NEXT: [[A_LANE_7:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x double> [[TMP6]], double [[V1_LANE_0]], i32 7
				; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_0]], i32 0
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <8 x double> [[TMP8]], double [[V2_LANE_2]], i32 1
				; CHECK-NEXT: [[TMP10:%.*]] = insertelement <8 x double> [[TMP9]], double [[V2_LANE_1]], i32 2
				; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x double> [[TMP10]], double [[V2_LANE_2]], i32 3
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x double> [[TMP11]], double [[V2_LANE_1]], i32 4
				; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x double> [[TMP12]], double [[V2_LANE_0]], i32 5
				; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x double> [[TMP13]], double [[V2_LANE_2]], i32 6
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x double> [[TMP14]], double [[V2_LANE_1]], i32 7
				; CHECK-NEXT: [[TMP16:%.*]] = fmul <8 x double> [[TMP7]], [[TMP15]]
	; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]			; CHECK-NEXT: [[A_LANE_8:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_0]]
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <8 x double> [[TMP16]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[A_INS_72:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP17]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2			; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_72]], double [[A_LANE_8]], i32 8
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x double> poison, double [[V1_LANE_7]], i32 0
	; CHECK-NEXT: [[A_INS_4:%.*]] = insertelement <9 x double> [[A_INS_3]], double [[A_LANE_4]], i32 4			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <8 x double> [[TMP18]], double [[V1_LANE_6]], i32 1
	; CHECK-NEXT: [[A_INS_5:%.*]] = insertelement <9 x double> [[A_INS_4]], double [[A_LANE_5]], i32 5			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <8 x double> [[TMP19]], double [[V1_LANE_8]], i32 2
	; CHECK-NEXT: [[A_INS_6:%.*]] = insertelement <9 x double> [[A_INS_5]], double [[A_LANE_6]], i32 6			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <8 x double> [[TMP20]], double [[V1_LANE_1]], i32 3
	; CHECK-NEXT: [[A_INS_7:%.*]] = insertelement <9 x double> [[A_INS_6]], double [[A_LANE_7]], i32 7			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x double> [[TMP21]], double [[V1_LANE_0]], i32 4
	; CHECK-NEXT: [[A_INS_8:%.*]] = insertelement <9 x double> [[A_INS_7]], double [[A_LANE_8]], i32 8			; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x double> [[TMP22]], double [[V1_LANE_3]], i32 5
	; CHECK-NEXT: [[B_LANE_0:%.*]] = fmul double [[V1_LANE_7]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP24:%.*]] = insertelement <8 x double> [[TMP23]], double [[V1_LANE_2]], i32 6
	; CHECK-NEXT: [[B_LANE_1:%.*]] = fmul double [[V1_LANE_6]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP25:%.*]] = insertelement <8 x double> [[TMP24]], double [[V1_LANE_5]], i32 7
	; CHECK-NEXT: [[B_LANE_2:%.*]] = fmul double [[V1_LANE_8]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <8 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[B_LANE_3:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <8 x double> [[TMP26]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[B_LANE_4:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <8 x double> [[TMP27]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[B_LANE_5:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <8 x double> [[TMP28]], double [[V2_LANE_2]], i32 3
	; CHECK-NEXT: [[B_LANE_6:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP30:%.*]] = insertelement <8 x double> [[TMP29]], double [[V2_LANE_0]], i32 4
	; CHECK-NEXT: [[B_LANE_7:%.*]] = fmul double [[V1_LANE_5]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <8 x double> [[TMP30]], double [[V2_LANE_2]], i32 5
				; CHECK-NEXT: [[TMP32:%.*]] = insertelement <8 x double> [[TMP31]], double [[V2_LANE_1]], i32 6
				; CHECK-NEXT: [[TMP33:%.*]] = insertelement <8 x double> [[TMP32]], double [[V2_LANE_0]], i32 7
				; CHECK-NEXT: [[TMP34:%.*]] = fmul <8 x double> [[TMP25]], [[TMP33]]
	; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]			; CHECK-NEXT: [[B_LANE_8:%.*]] = fmul double [[V1_LANE_4]], [[V2_LANE_2]]
	; CHECK-NEXT: [[B_INS_0:%.*]] = insertelement <9 x double> undef, double [[B_LANE_0]], i32 0			; CHECK-NEXT: [[TMP35:%.*]] = shufflevector <8 x double> [[TMP34]], <8 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef>
	; CHECK-NEXT: [[B_INS_1:%.*]] = insertelement <9 x double> [[B_INS_0]], double [[B_LANE_1]], i32 1			; CHECK-NEXT: [[B_INS_71:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP35]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 8>
	; CHECK-NEXT: [[B_INS_2:%.*]] = insertelement <9 x double> [[B_INS_1]], double [[B_LANE_2]], i32 2			; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_71]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[B_INS_3:%.*]] = insertelement <9 x double> [[B_INS_2]], double [[B_LANE_3]], i32 3
	; CHECK-NEXT: [[B_INS_4:%.*]] = insertelement <9 x double> [[B_INS_3]], double [[B_LANE_4]], i32 4
	; CHECK-NEXT: [[B_INS_5:%.*]] = insertelement <9 x double> [[B_INS_4]], double [[B_LANE_5]], i32 5
	; CHECK-NEXT: [[B_INS_6:%.*]] = insertelement <9 x double> [[B_INS_5]], double [[B_LANE_6]], i32 6
	; CHECK-NEXT: [[B_INS_7:%.*]] = insertelement <9 x double> [[B_INS_6]], double [[B_LANE_7]], i32 7
	; CHECK-NEXT: [[B_INS_8:%.*]] = insertelement <9 x double> [[B_INS_7]], double [[B_LANE_8]], i32 8
	; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]			; CHECK-NEXT: [[RES:%.*]] = fsub <9 x double> [[A_INS_8]], [[B_INS_8]]
	; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[RES]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = extractelement <2 x i16> [[TMP2]], i32 0			; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> undef, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> poison, i16 [[TMP3]], i64 0			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP2]], i32 1
	; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP4]], i64 1
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 31 Lines
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_3]]			; GFX8-NEXT: ret <4 x i16> [[INS_32]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines
	; GFX8-LABEL: @uadd_sat_v3i16(			; GFX8-LABEL: @uadd_sat_v3i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <3 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2			; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <3 x i16> [[ARG1:%.]], i64 2
	; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <3 x i16> [[ARG0]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <3 x i16> [[ARG1]], <3 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP3:%.*]] = extractelement <2 x i16> [[TMP2]], i32 0			; GFX8-NEXT: [[INS_11:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> undef, <3 x i32> <i32 0, i32 1, i32 undef>
	; GFX8-NEXT: [[INS_0:%.*]] = insertelement <3 x i16> undef, i16 [[TMP3]], i64 0			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_11]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP2]], i32 1
	; GFX8-NEXT: [[INS_1:%.*]] = insertelement <3 x i16> [[INS_0]], i16 [[TMP4]], i64 1
	; GFX8-NEXT: [[INS_2:%.*]] = insertelement <3 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <3 x i16> [[INS_2]]			; GFX8-NEXT: ret <3 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <3 x i16> %arg0, i64 0			%arg0.0 = extractelement <3 x i16> %arg0, i64 0
	%arg0.1 = extractelement <3 x i16> %arg0, i64 1			%arg0.1 = extractelement <3 x i16> %arg0, i64 1
	%arg0.2 = extractelement <3 x i16> %arg0, i64 2			%arg0.2 = extractelement <3 x i16> %arg0, i64 2
	%arg1.0 = extractelement <3 x i16> %arg1, i64 0			%arg1.0 = extractelement <3 x i16> %arg1, i64 0
	%arg1.1 = extractelement <3 x i16> %arg1, i64 1			%arg1.1 = extractelement <3 x i16> %arg1, i64 1
	Show All 31 Lines
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> undef, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> undef, <2 x i32> <i32 2, i32 3>
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])
	; GFX8-NEXT: [[INS_3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_32:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; GFX8-NEXT: ret <4 x i16> [[INS_3]]			; GFX8-NEXT: ret <4 x i16> [[INS_32]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/bswap-inseltpoison.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s

	define <2 x i16> @bswap_v2i16(<2 x i16> %arg) {			define <2 x i16> @bswap_v2i16(<2 x i16> %arg) {
				RKSimonUnsubmitted Done Reply Inline Actions Regenerate + commit these files in trunk first - you will need to remove these manually added CHECK lines otherwise FileCheck will fail. RKSimon: Regenerate + commit these files in trunk first - you will need to remove these manually added…
	; GFX7-LABEL: @bswap_v2i16(			; GFX7-LABEL: @bswap_v2i16(
	; GFX7-NEXT: bb:			; GFX7-NEXT: bb:
	; GFX7-NEXT: [[T:%.]] = extractelement <2 x i16> [[ARG:%.]], i64 0			; GFX7-NEXT: [[T:%.]] = extractelement <2 x i16> [[ARG:%.]], i64 0
	; GFX7-NEXT: [[T1:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T]])			; GFX7-NEXT: [[T1:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T]])
	; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x i16> poison, i16 [[T1]], i64 0			; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x i16> poison, i16 [[T1]], i64 0
	; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[ARG]], i64 1			; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[ARG]], i64 1
	; GFX7-NEXT: [[T4:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T3]])			; GFX7-NEXT: [[T4:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T3]])
	; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[T4]], i64 1			; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[T4]], i64 1
	; GFX7-NEXT: ret <2 x i16> [[T5]]			; GFX7-NEXT: ret <2 x i16> [[T5]]
	;			;
	; GFX8-LABEL: @bswap_v2i16(			; GFX8-LABEL: @bswap_v2i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> [[ARG:%.]])			; GFX8-NEXT: [[TMP0:%.]] = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> [[ARG:%.]])
	; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[TMP0]], i32 0			; GFX8-NEXT: ret <2 x i16> [[TMP0]]
	; GFX8-NEXT: [[T2:%.*]] = insertelement <2 x i16> poison, i16 [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x i16> [[TMP0]], i32 1
	; GFX8-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[TMP2]], i64 1
	; GFX8-NEXT: ret <2 x i16> [[T5]]
	;			;
	bb:			bb:
	%t = extractelement <2 x i16> %arg, i64 0			%t = extractelement <2 x i16> %arg, i64 0
	%t1 = tail call i16 @llvm.bswap.i16(i16 %t)			%t1 = tail call i16 @llvm.bswap.i16(i16 %t)
	%t2 = insertelement <2 x i16> poison, i16 %t1, i64 0			%t2 = insertelement <2 x i16> poison, i16 %t1, i64 0
	%t3 = extractelement <2 x i16> %arg, i64 1			%t3 = extractelement <2 x i16> %arg, i64 1
	%t4 = tail call i16 @llvm.bswap.i16(i16 %t3)			%t4 = tail call i16 @llvm.bswap.i16(i16 %t3)
	%t5 = insertelement <2 x i16> %t2, i16 %t4, i64 1			%t5 = insertelement <2 x i16> %t2, i16 %t4, i64 1
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/bswap.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s

	define <2 x i16> @bswap_v2i16(<2 x i16> %arg) {			define <2 x i16> @bswap_v2i16(<2 x i16> %arg) {
				RKSimonUnsubmitted Done Reply Inline Actions Regenerate + commit these files in trunk first - you will need to remove these manually added CHECK lines otherwise FileCheck will fail. RKSimon: Regenerate + commit these files in trunk first - you will need to remove these manually added…
	; GFX7-LABEL: @bswap_v2i16(			; GFX7-LABEL: @bswap_v2i16(
	; GFX7-NEXT: bb:			; GFX7-NEXT: bb:
	; GFX7-NEXT: [[T:%.]] = extractelement <2 x i16> [[ARG:%.]], i64 0			; GFX7-NEXT: [[T:%.]] = extractelement <2 x i16> [[ARG:%.]], i64 0
	; GFX7-NEXT: [[T1:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T]])			; GFX7-NEXT: [[T1:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T]])
	; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x i16> undef, i16 [[T1]], i64 0			; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x i16> undef, i16 [[T1]], i64 0
	; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[ARG]], i64 1			; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[ARG]], i64 1
	; GFX7-NEXT: [[T4:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T3]])			; GFX7-NEXT: [[T4:%.*]] = tail call i16 @llvm.bswap.i16(i16 [[T3]])
	; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[T4]], i64 1			; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[T4]], i64 1
	; GFX7-NEXT: ret <2 x i16> [[T5]]			; GFX7-NEXT: ret <2 x i16> [[T5]]
	;			;
	; GFX8-LABEL: @bswap_v2i16(			; GFX8-LABEL: @bswap_v2i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> [[ARG:%.]])			; GFX8-NEXT: [[TMP0:%.]] = call <2 x i16> @llvm.bswap.v2i16(<2 x i16> [[ARG:%.]])
	; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[TMP0]], i32 0			; GFX8-NEXT: ret <2 x i16> [[TMP0]]
	; GFX8-NEXT: [[T2:%.*]] = insertelement <2 x i16> undef, i16 [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x i16> [[TMP0]], i32 1
	; GFX8-NEXT: [[T5:%.*]] = insertelement <2 x i16> [[T2]], i16 [[TMP2]], i64 1
	; GFX8-NEXT: ret <2 x i16> [[T5]]
	;			;
	bb:			bb:
	%t = extractelement <2 x i16> %arg, i64 0			%t = extractelement <2 x i16> %arg, i64 0
	%t1 = tail call i16 @llvm.bswap.i16(i16 %t)			%t1 = tail call i16 @llvm.bswap.i16(i16 %t)
	%t2 = insertelement <2 x i16> undef, i16 %t1, i64 0			%t2 = insertelement <2 x i16> undef, i16 %t1, i64 0
	%t3 = extractelement <2 x i16> %arg, i64 1			%t3 = extractelement <2 x i16> %arg, i64 1
	%t4 = tail call i16 @llvm.bswap.i16(i16 %t3)			%t4 = tail call i16 @llvm.bswap.i16(i16 %t3)
	%t5 = insertelement <2 x i16> %t2, i16 %t4, i64 1			%t5 = insertelement <2 x i16> %t2, i16 %t4, i64 1
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck %s

	define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {			define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {
	; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(			; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7			; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7
	; CHECK-NEXT: [[ARG0_2:%.]] = extractelement <9 x i16> [[ARG0:%.]], i64 8			; CHECK-NEXT: [[ARG0_2:%.]] = extractelement <9 x i16> [[ARG0:%.]], i64 8
	; CHECK-NEXT: [[ARG1_1:%.]] = extractelement <9 x i16> [[ARG1:%.]], i64 7			; CHECK-NEXT: [[ARG1_1:%.]] = extractelement <9 x i16> [[ARG1:%.]], i64 7
	; CHECK-NEXT: [[ARG1_2:%.*]] = extractelement <9 x i16> [[ARG1]], i64 8			; CHECK-NEXT: [[ARG1_2:%.*]] = extractelement <9 x i16> [[ARG1]], i64 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[ARG0_1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> poison, i16 [[ARG0_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 [[ARG0_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 [[ARG0_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[ARG1_1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i16> poison, i16 [[ARG1_1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> [[TMP2]], i16 [[ARG1_2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i16> [[TMP2]], i16 [[ARG1_2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP1]], <2 x i16> [[TMP3]])
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i16> [[TMP4]], i32 0			; CHECK-NEXT: ret <2 x i16> [[TMP4]]
	; CHECK-NEXT: [[INS_1:%.*]] = insertelement <2 x i16> undef, i16 [[TMP5]], i64 0
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP4]], i32 1
	; CHECK-NEXT: [[INS_2:%.*]] = insertelement <2 x i16> [[INS_1]], i16 [[TMP6]], i64 1
	; CHECK-NEXT: ret <2 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.1 = extractelement <9 x i16> undef, i64 7			%arg0.1 = extractelement <9 x i16> undef, i64 7
	%arg0.2 = extractelement <9 x i16> %arg0, i64 8			%arg0.2 = extractelement <9 x i16> %arg0, i64 8
	%arg1.1 = extractelement <9 x i16> %arg1, i64 7			%arg1.1 = extractelement <9 x i16> %arg1, i64 7
	%arg1.2 = extractelement <9 x i16> %arg1, i64 8			%arg1.2 = extractelement <9 x i16> %arg1, i64 8
	%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)			%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)
	%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)			%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)
	%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0			%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0
	%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1			%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1
	ret <2 x i16> %ins.2			ret <2 x i16> %ins.2
	}			}

	declare i16 @llvm.uadd.sat.i16(i16, i16) #0			declare i16 @llvm.uadd.sat.i16(i16, i16) #0
	attributes #0 = { nounwind readnone speculatable willreturn }			attributes #0 = { nounwind readnone speculatable willreturn }

llvm/test/Transforms/SLPVectorizer/AMDGPU/round-inseltpoison.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s

	define <2 x half> @round_v2f16(<2 x half> %arg) {			define <2 x half> @round_v2f16(<2 x half> %arg) {
				RKSimonUnsubmitted Done Reply Inline Actions Regenerate + commit these files in trunk first - you will need to remove these manually added CHECK lines otherwise FileCheck will fail. RKSimon: Regenerate + commit these files in trunk first - you will need to remove these manually added…
	; GFX7-LABEL: @round_v2f16(			; GFX7-LABEL: @round_v2f16(
	; GFX7-NEXT: bb:			; GFX7-NEXT: bb:
	; GFX7-NEXT: [[T:%.]] = extractelement <2 x half> [[ARG:%.]], i64 0			; GFX7-NEXT: [[T:%.]] = extractelement <2 x half> [[ARG:%.]], i64 0
	; GFX7-NEXT: [[T1:%.*]] = tail call half @llvm.round.f16(half [[T]])			; GFX7-NEXT: [[T1:%.*]] = tail call half @llvm.round.f16(half [[T]])
	; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x half> poison, half [[T1]], i64 0			; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x half> poison, half [[T1]], i64 0
	; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x half> [[ARG]], i64 1			; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x half> [[ARG]], i64 1
	; GFX7-NEXT: [[T4:%.*]] = tail call half @llvm.round.f16(half [[T3]])			; GFX7-NEXT: [[T4:%.*]] = tail call half @llvm.round.f16(half [[T3]])
	; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[T4]], i64 1			; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[T4]], i64 1
	; GFX7-NEXT: ret <2 x half> [[T5]]			; GFX7-NEXT: ret <2 x half> [[T5]]
	;			;
	; GFX8-LABEL: @round_v2f16(			; GFX8-LABEL: @round_v2f16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = call <2 x half> @llvm.round.v2f16(<2 x half> [[ARG:%.]])			; GFX8-NEXT: [[TMP0:%.]] = call <2 x half> @llvm.round.v2f16(<2 x half> [[ARG:%.]])
	; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x half> [[TMP0]], i32 0			; GFX8-NEXT: ret <2 x half> [[TMP0]]
	; GFX8-NEXT: [[T2:%.*]] = insertelement <2 x half> poison, half [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x half> [[TMP0]], i32 1
	; GFX8-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[TMP2]], i64 1
	; GFX8-NEXT: ret <2 x half> [[T5]]
	;			;
	bb:			bb:
	%t = extractelement <2 x half> %arg, i64 0			%t = extractelement <2 x half> %arg, i64 0
	%t1 = tail call half @llvm.round.half(half %t)			%t1 = tail call half @llvm.round.half(half %t)
	%t2 = insertelement <2 x half> poison, half %t1, i64 0			%t2 = insertelement <2 x half> poison, half %t1, i64 0
	%t3 = extractelement <2 x half> %arg, i64 1			%t3 = extractelement <2 x half> %arg, i64 1
	%t4 = tail call half @llvm.round.half(half %t3)			%t4 = tail call half @llvm.round.half(half %t3)
	%t5 = insertelement <2 x half> %t2, half %t4, i64 1			%t5 = insertelement <2 x half> %t2, half %t4, i64 1
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/round.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX7 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -slp-vectorizer %s \| FileCheck -check-prefixes=GCN,GFX8 %s

	define <2 x half> @round_v2f16(<2 x half> %arg) {			define <2 x half> @round_v2f16(<2 x half> %arg) {
				RKSimonUnsubmitted Done Reply Inline Actions Regenerate + commit these files in trunk first - you will need to remove these manually added CHECK lines otherwise FileCheck will fail. RKSimon: Regenerate + commit these files in trunk first - you will need to remove these manually added…
	; GFX7-LABEL: @round_v2f16(			; GFX7-LABEL: @round_v2f16(
	; GFX7-NEXT: bb:			; GFX7-NEXT: bb:
	; GFX7-NEXT: [[T:%.]] = extractelement <2 x half> [[ARG:%.]], i64 0			; GFX7-NEXT: [[T:%.]] = extractelement <2 x half> [[ARG:%.]], i64 0
	; GFX7-NEXT: [[T1:%.*]] = tail call half @llvm.round.f16(half [[T]])			; GFX7-NEXT: [[T1:%.*]] = tail call half @llvm.round.f16(half [[T]])
	; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x half> undef, half [[T1]], i64 0			; GFX7-NEXT: [[T2:%.*]] = insertelement <2 x half> undef, half [[T1]], i64 0
	; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x half> [[ARG]], i64 1			; GFX7-NEXT: [[T3:%.*]] = extractelement <2 x half> [[ARG]], i64 1
	; GFX7-NEXT: [[T4:%.*]] = tail call half @llvm.round.f16(half [[T3]])			; GFX7-NEXT: [[T4:%.*]] = tail call half @llvm.round.f16(half [[T3]])
	; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[T4]], i64 1			; GFX7-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[T4]], i64 1
	; GFX7-NEXT: ret <2 x half> [[T5]]			; GFX7-NEXT: ret <2 x half> [[T5]]
	;			;
	; GFX8-LABEL: @round_v2f16(			; GFX8-LABEL: @round_v2f16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = call <2 x half> @llvm.round.v2f16(<2 x half> [[ARG:%.]])			; GFX8-NEXT: [[TMP0:%.]] = call <2 x half> @llvm.round.v2f16(<2 x half> [[ARG:%.]])
	; GFX8-NEXT: [[TMP1:%.*]] = extractelement <2 x half> [[TMP0]], i32 0			; GFX8-NEXT: ret <2 x half> [[TMP0]]
	; GFX8-NEXT: [[T2:%.*]] = insertelement <2 x half> undef, half [[TMP1]], i64 0
	; GFX8-NEXT: [[TMP2:%.*]] = extractelement <2 x half> [[TMP0]], i32 1
	; GFX8-NEXT: [[T5:%.*]] = insertelement <2 x half> [[T2]], half [[TMP2]], i64 1
	; GFX8-NEXT: ret <2 x half> [[T5]]
	;			;
	bb:			bb:
	%t = extractelement <2 x half> %arg, i64 0			%t = extractelement <2 x half> %arg, i64 0
	%t1 = tail call half @llvm.round.half(half %t)			%t1 = tail call half @llvm.round.half(half %t)
	%t2 = insertelement <2 x half> undef, half %t1, i64 0			%t2 = insertelement <2 x half> undef, half %t1, i64 0
	%t3 = extractelement <2 x half> %arg, i64 1			%t3 = extractelement <2 x half> %arg, i64 1
	%t4 = tail call half @llvm.round.half(half %t3)			%t4 = tail call half @llvm.round.half(half %t3)
	%t5 = insertelement <2 x half> %t2, half %t4, i64 1			%t5 = insertelement <2 x half> %t2, half %t4, i64 1
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/ARM/extract-insert-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=thumb7 -mcpu=swift \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=thumb7 -mcpu=swift \| FileCheck %s

	define <4 x i32> @PR13837(<4 x float> %in) {			define <4 x i32> @PR13837(<4 x float> %in) {
	; CHECK-LABEL: @PR13837(			; CHECK-LABEL: @PR13837(
	; CHECK-NEXT: [[TMP1:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP1:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0			; CHECK-NEXT: ret <4 x i32> [[TMP1]]
	; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
	; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
	; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP5]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[V3]]
	;			;
	%t0 = extractelement <4 x float> %in, i64 0			%t0 = extractelement <4 x float> %in, i64 0
	%t1 = extractelement <4 x float> %in, i64 1			%t1 = extractelement <4 x float> %in, i64 1
	%t2 = extractelement <4 x float> %in, i64 2			%t2 = extractelement <4 x float> %in, i64 2
	%t3 = extractelement <4 x float> %in, i64 3			%t3 = extractelement <4 x float> %in, i64 3
	%c0 = fptosi float %t0 to i32			%c0 = fptosi float %t0 to i32
	%c1 = fptosi float %t1 to i32			%c1 = fptosi float %t1 to i32
	%c2 = fptosi float %t2 to i32			%c2 = fptosi float %t2 to i32
	%c3 = fptosi float %t3 to i32			%c3 = fptosi float %t3 to i32
	%v0 = insertelement <4 x i32> poison, i32 %c0, i32 0			%v0 = insertelement <4 x i32> poison, i32 %c0, i32 0
	%v1 = insertelement <4 x i32> %v0, i32 %c1, i32 1			%v1 = insertelement <4 x i32> %v0, i32 %c1, i32 1
	%v2 = insertelement <4 x i32> %v1, i32 %c2, i32 2			%v2 = insertelement <4 x i32> %v1, i32 %c2, i32 2
	%v3 = insertelement <4 x i32> %v2, i32 %c3, i32 3			%v3 = insertelement <4 x i32> %v2, i32 %c3, i32 3
	ret <4 x i32> %v3			ret <4 x i32> %v3
	}			}

llvm/test/Transforms/SLPVectorizer/ARM/extract-insert.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=thumb7 -mcpu=swift \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=thumb7 -mcpu=swift \| FileCheck %s

	define <4 x i32> @PR13837(<4 x float> %in) {			define <4 x i32> @PR13837(<4 x float> %in) {
	; CHECK-LABEL: @PR13837(			; CHECK-LABEL: @PR13837(
	; CHECK-NEXT: [[TMP1:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP1:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0			; CHECK-NEXT: ret <4 x i32> [[TMP1]]
	; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
	; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
	; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
	; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP5]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[V3]]
	;			;
	%t0 = extractelement <4 x float> %in, i64 0			%t0 = extractelement <4 x float> %in, i64 0
	%t1 = extractelement <4 x float> %in, i64 1			%t1 = extractelement <4 x float> %in, i64 1
	%t2 = extractelement <4 x float> %in, i64 2			%t2 = extractelement <4 x float> %in, i64 2
	%t3 = extractelement <4 x float> %in, i64 3			%t3 = extractelement <4 x float> %in, i64 3
	%c0 = fptosi float %t0 to i32			%c0 = fptosi float %t0 to i32
	%c1 = fptosi float %t1 to i32			%c1 = fptosi float %t1 to i32
	%c2 = fptosi float %t2 to i32			%c2 = fptosi float %t2 to i32
	%c3 = fptosi float %t3 to i32			%c3 = fptosi float %t3 to i32
	%v0 = insertelement <4 x i32> undef, i32 %c0, i32 0			%v0 = insertelement <4 x i32> undef, i32 %c0, i32 0
	%v1 = insertelement <4 x i32> %v0, i32 %c1, i32 1			%v1 = insertelement <4 x i32> %v0, i32 %c1, i32 1
	%v2 = insertelement <4 x i32> %v1, i32 %c2, i32 2			%v2 = insertelement <4 x i32> %v1, i32 %c2, i32 2
	%v3 = insertelement <4 x i32> %v2, i32 %c3, i32 3			%v3 = insertelement <4 x i32> %v2, i32 %c3, i32 3
	ret <4 x i32> %v3			ret <4 x i32> %v3
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 4			; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP7]], i32 5
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_4_I:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 4			; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[VECINS_I_5_I:%.*]] = insertelement <8 x i32> [[VECINS_I_4_I]], i32 [[TMP7]], i32 5
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP11:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])
	; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]			; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	Show All 21 Lines
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[OP_EXTRA26]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[OP_EXTRA26]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP11]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
	; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; SSE-LABEL: @sitofp_uitofp(		; SSE-LABEL: @sitofp_uitofp(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: ret <8 x float> [[R72]]
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = sitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = sitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = uitofp i32 [[A4]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i32 [[A5]] to float
; SSE-NEXT: [[AB6:%.*]] = uitofp i32 [[A6]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i32 [[A7]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp(		; SLM-LABEL: @sitofp_uitofp(
; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[TMP3]]
;		;
; AVX-LABEL: @sitofp_uitofp(		; AVX-LABEL: @sitofp_uitofp(
; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX-NEXT: ret <8 x float> [[R7]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
; AVX512-LABEL: @sitofp_uitofp(		; AVX512-LABEL: @sitofp_uitofp(
; AVX512-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; AVX512-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x float> [[R7]]		; AVX512-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 14 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @fptosi_fptoui(<8 x float> %a) {		define <8 x i32> @fptosi_fptoui(<8 x float> %a) {
; SSE-LABEL: @fptosi_fptoui(		; SSE-LABEL: @fptosi_fptoui(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SSE-NEXT: [[A4:%.]] = extractelement <8 x float> [[A:%.]], i32 4
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SSE-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R31]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @fptosi_fptoui(		; SLM-LABEL: @fptosi_fptoui(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SLM-NEXT: [[A4:%.]] = extractelement <8 x float> [[A:%.]], i32 4
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SLM-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32		; SLM-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SLM-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SLM-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SLM-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SLM-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SLM-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SLM-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SLM-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SLM-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SLM-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SLM-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SLM-NEXT: [[R31:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R31]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]		; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX-LABEL: @fptosi_fptoui(		; AVX-LABEL: @fptosi_fptoui(
; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1		; AVX-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3		; AVX-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP3]] to <4 x i32>
; AVX-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; AVX-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; AVX-NEXT: ret <8 x i32> [[R72]]
; AVX-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; AVX-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; AVX-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32
; AVX-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; AVX-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; AVX-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; AVX-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; AVX-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; AVX-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; AVX-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @fptosi_fptoui(		; AVX512-LABEL: @fptosi_fptoui(
; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>		; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>
; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>		; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	;
%r7 = insertelement <8 x float> %r6, float %ac7, i32 7		%r7 = insertelement <8 x float> %r6, float %ac7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @sext_zext(<8 x i16> %a) {		define <8 x i32> @sext_zext(<8 x i16> %a) {
; CHECK-LABEL: @sext_zext(		; CHECK-LABEL: @sext_zext(
; CHECK-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>		; CHECK-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>
; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>		; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i16> %a, i32 0		%a0 = extractelement <8 x i16> %a, i32 0
%a1 = extractelement <8 x i16> %a, i32 1		%a1 = extractelement <8 x i16> %a, i32 1
%a2 = extractelement <8 x i16> %a, i32 2		%a2 = extractelement <8 x i16> %a, i32 2
%a3 = extractelement <8 x i16> %a, i32 3		%a3 = extractelement <8 x i16> %a, i32 3
%a4 = extractelement <8 x i16> %a, i32 4		%a4 = extractelement <8 x i16> %a, i32 4
%a5 = extractelement <8 x i16> %a, i32 5		%a5 = extractelement <8 x i16> %a, i32 5
%a6 = extractelement <8 x i16> %a, i32 6		%a6 = extractelement <8 x i16> %a, i32 6
Show All 13 Lines	;
%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4		%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; SSE-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; CHECK-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; CHECK-NEXT: [[B2:%.*]] = extractelement <8 x i16> [[B]], i32 2		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i16> [[B]], i32 2
; CHECK-NEXT: [[B3:%.*]] = extractelement <8 x i16> [[B]], i32 3		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i16> [[B]], i32 3
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; SSE-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; CHECK-NEXT: [[AB5:%.*]] = sitofp i16 [[B1]] to float		; SSE-NEXT: [[AB5:%.*]] = sitofp i16 [[B1]] to float
; CHECK-NEXT: [[AB6:%.*]] = sitofp i16 [[B2]] to float		; SSE-NEXT: [[AB6:%.*]] = sitofp i16 [[B2]] to float
; CHECK-NEXT: [[AB7:%.*]] = sitofp i16 [[B3]] to float		; SSE-NEXT: [[AB7:%.*]] = sitofp i16 [[B3]] to float
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R31]], float [[AB4]], i32 4
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2		; SSE-NEXT: ret <8 x float> [[R7]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3		;
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3		; SLM-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SLM-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R72]]
		;
		; AVX-LABEL: @sitofp_4i32_8i16(
		; AVX-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
		; AVX-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; AVX-NEXT: ret <8 x float> [[R72]]
		;
		; AVX512-LABEL: @sitofp_4i32_8i16(
		; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX512-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
		; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; AVX512-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%b2 = extractelement <8 x i16> %b, i32 2		%b2 = extractelement <8 x i16> %b, i32 2
Show All 14 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SSE-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; CHECK-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1		; CHECK-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2		; CHECK-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3		; CHECK-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1		; CHECK-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float		; CHECK-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float		; CHECK-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float		; CHECK-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float		; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R31]], float [[AB4]], i32 4
; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float		; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float		; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float		; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0		; CHECK-NEXT: ret <8 x float> [[R7]]
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]
;
; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SLM-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; SLM-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SLM-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SLM-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP4]], i32 1
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP5]], i32 2
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP6]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]
;
; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; AVX-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
; AVX-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; AVX-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; AVX-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; AVX-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; AVX-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; AVX-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; AVX-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; AVX-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; AVX-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; AVX-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float
; AVX-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float
; AVX-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; AVX-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; AVX-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; AVX-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; AVX-NEXT: ret <8 x float> [[R7]]
;
; AVX512-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; AVX512-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; AVX512-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; AVX512-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; AVX512-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; AVX512-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; AVX512-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; AVX512-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; AVX512-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i32 0
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP4]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP5]], i32 2
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP6]], i32 3
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; AVX512-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @sitofp_uitofp(<8 x i32> %a) {		define <8 x float> @sitofp_uitofp(<8 x i32> %a) {
; SSE-LABEL: @sitofp_uitofp(		; SSE-LABEL: @sitofp_uitofp(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5		; SSE-NEXT: ret <8 x float> [[R72]]
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; SSE-NEXT: [[AB2:%.*]] = sitofp i32 [[A2]] to float
; SSE-NEXT: [[AB3:%.*]] = sitofp i32 [[A3]] to float
; SSE-NEXT: [[AB4:%.*]] = uitofp i32 [[A4]] to float
; SSE-NEXT: [[AB5:%.*]] = uitofp i32 [[A5]] to float
; SSE-NEXT: [[AB6:%.*]] = uitofp i32 [[A6]] to float
; SSE-NEXT: [[AB7:%.*]] = uitofp i32 [[A7]] to float
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]
;		;
; SLM-LABEL: @sitofp_uitofp(		; SLM-LABEL: @sitofp_uitofp(
; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; SLM-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; SLM-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; SLM-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[TMP3]]
;		;
; AVX-LABEL: @sitofp_uitofp(		; AVX-LABEL: @sitofp_uitofp(
; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; AVX-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX-NEXT: ret <8 x float> [[R7]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
; AVX512-LABEL: @sitofp_uitofp(		; AVX512-LABEL: @sitofp_uitofp(
; AVX512-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>		; AVX512-NEXT: [[TMP1:%.]] = sitofp <8 x i32> [[A:%.]] to <8 x float>
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[A]] to <8 x float>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x float> [[R7]]		; AVX512-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 14 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @fptosi_fptoui(<8 x float> %a) {		define <8 x i32> @fptosi_fptoui(<8 x float> %a) {
; SSE-LABEL: @fptosi_fptoui(		; SSE-LABEL: @fptosi_fptoui(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SSE-NEXT: [[A4:%.]] = extractelement <8 x float> [[A:%.]], i32 4
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SSE-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SSE-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SSE-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SSE-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SSE-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SSE-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SSE-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R31]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @fptosi_fptoui(		; SLM-LABEL: @fptosi_fptoui(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; SLM-NEXT: [[A4:%.]] = extractelement <8 x float> [[A:%.]], i32 4
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SLM-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32		; SLM-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SLM-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; SLM-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; SLM-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32		; SLM-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; SLM-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32		; SLM-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; SLM-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32		; SLM-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; SLM-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32		; SLM-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SLM-NEXT: [[R31:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R31]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5		; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SLM-NEXT: ret <8 x i32> [[R7]]		; SLM-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX-LABEL: @fptosi_fptoui(		; AVX-LABEL: @fptosi_fptoui(
; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1		; AVX-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2		; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3		; AVX-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP3]] to <4 x i32>
; AVX-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4		; AVX-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5		; AVX-NEXT: ret <8 x i32> [[R72]]
; AVX-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; AVX-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; AVX-NEXT: [[AB0:%.*]] = fptosi float [[A0]] to i32
; AVX-NEXT: [[AB1:%.*]] = fptosi float [[A1]] to i32
; AVX-NEXT: [[AB2:%.*]] = fptosi float [[A2]] to i32
; AVX-NEXT: [[AB3:%.*]] = fptosi float [[A3]] to i32
; AVX-NEXT: [[AB4:%.*]] = fptoui float [[A4]] to i32
; AVX-NEXT: [[AB5:%.*]] = fptoui float [[A5]] to i32
; AVX-NEXT: [[AB6:%.*]] = fptoui float [[A6]] to i32
; AVX-NEXT: [[AB7:%.*]] = fptoui float [[A7]] to i32
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @fptosi_fptoui(		; AVX512-LABEL: @fptosi_fptoui(
; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>		; AVX512-NEXT: [[TMP1:%.]] = fptosi <8 x float> [[A:%.]] to <8 x i32>
; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>		; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[A]] to <8 x i32>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	;
%r7 = insertelement <8 x float> %r6, float %ac7, i32 7		%r7 = insertelement <8 x float> %r6, float %ac7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x i32> @sext_zext(<8 x i16> %a) {		define <8 x i32> @sext_zext(<8 x i16> %a) {
; CHECK-LABEL: @sext_zext(		; CHECK-LABEL: @sext_zext(
; CHECK-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>		; CHECK-NEXT: [[TMP1:%.]] = sext <8 x i16> [[A:%.]] to <8 x i32>
; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>		; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[A]] to <8 x i32>
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i16> %a, i32 0		%a0 = extractelement <8 x i16> %a, i32 0
%a1 = extractelement <8 x i16> %a, i32 1		%a1 = extractelement <8 x i16> %a, i32 1
%a2 = extractelement <8 x i16> %a, i32 2		%a2 = extractelement <8 x i16> %a, i32 2
%a3 = extractelement <8 x i16> %a, i32 3		%a3 = extractelement <8 x i16> %a, i32 3
%a4 = extractelement <8 x i16> %a, i32 4		%a4 = extractelement <8 x i16> %a, i32 4
%a5 = extractelement <8 x i16> %a, i32 5		%a5 = extractelement <8 x i16> %a, i32 5
%a6 = extractelement <8 x i16> %a, i32 6		%a6 = extractelement <8 x i16> %a, i32 6
Show All 13 Lines	;
%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4		%r4 = insertelement <8 x i32> %r3, i32 %ab4, i32 4
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {		define <8 x float> @sitofp_4i32_8i16(<4 x i32> %a, <8 x i16> %b) {
; CHECK-LABEL: @sitofp_4i32_8i16(		; SSE-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; CHECK-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; CHECK-NEXT: [[B2:%.*]] = extractelement <8 x i16> [[B]], i32 2		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i16> [[B]], i32 2
; CHECK-NEXT: [[B3:%.*]] = extractelement <8 x i16> [[B]], i32 3		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i16> [[B]], i32 3
; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>		; SSE-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; CHECK-NEXT: [[AB5:%.*]] = sitofp i16 [[B1]] to float		; SSE-NEXT: [[AB5:%.*]] = sitofp i16 [[B1]] to float
; CHECK-NEXT: [[AB6:%.*]] = sitofp i16 [[B2]] to float		; SSE-NEXT: [[AB6:%.*]] = sitofp i16 [[B2]] to float
; CHECK-NEXT: [[AB7:%.*]] = sitofp i16 [[B3]] to float		; SSE-NEXT: [[AB7:%.*]] = sitofp i16 [[B3]] to float
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; SSE-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R31]], float [[AB4]], i32 4
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2		; SSE-NEXT: ret <8 x float> [[R7]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3		;
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3		; SLM-LABEL: @sitofp_4i32_8i16(
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4		; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6		; SLM-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7		; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <8 x float> [[R7]]		; SLM-NEXT: ret <8 x float> [[R72]]
		;
		; AVX-LABEL: @sitofp_4i32_8i16(
		; AVX-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
		; AVX-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; AVX-NEXT: ret <8 x float> [[R72]]
		;
		; AVX512-LABEL: @sitofp_4i32_8i16(
		; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i16> [[B:%.]], <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX512-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>
		; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; AVX512-NEXT: ret <8 x float> [[R72]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%b2 = extractelement <8 x i16> %b, i32 2		%b2 = extractelement <8 x i16> %b, i32 2
Show All 14 Lines	;
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

; Inspired by PR38154		; Inspired by PR38154
define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {		define <8 x float> @sitofp_uitofp_4i32_8i16_16i8(<4 x i32> %a, <8 x i16> %b, <16 x i8> %c) {
; SSE-LABEL: @sitofp_uitofp_4i32_8i16_16i8(		; CHECK-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SSE-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0		; CHECK-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1		; CHECK-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SSE-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2		; CHECK-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SSE-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3		; CHECK-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; SSE-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1		; CHECK-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SSE-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float		; CHECK-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SSE-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float		; CHECK-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SSE-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float		; CHECK-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SSE-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float		; CHECK-NEXT: [[R31:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float		; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R31]], float [[AB4]], i32 4
; SSE-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float		; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float		; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float		; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0		; CHECK-NEXT: ret <8 x float> [[R7]]
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SSE-NEXT: ret <8 x float> [[R7]]
;
; SLM-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; SLM-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; SLM-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; SLM-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; SLM-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; SLM-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; SLM-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; SLM-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; SLM-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; SLM-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; SLM-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP3]], i32 0
; SLM-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP4]], i32 1
; SLM-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP5]], i32 2
; SLM-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP6]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]
;
; AVX-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; AVX-NEXT: [[A0:%.]] = extractelement <4 x i32> [[A:%.]], i32 0
; AVX-NEXT: [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
; AVX-NEXT: [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
; AVX-NEXT: [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
; AVX-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; AVX-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; AVX-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; AVX-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; AVX-NEXT: [[AB0:%.*]] = sitofp i32 [[A0]] to float
; AVX-NEXT: [[AB1:%.*]] = sitofp i32 [[A1]] to float
; AVX-NEXT: [[AB2:%.*]] = uitofp i32 [[A2]] to float
; AVX-NEXT: [[AB3:%.*]] = uitofp i32 [[A3]] to float
; AVX-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; AVX-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; AVX-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; AVX-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; AVX-NEXT: ret <8 x float> [[R7]]
;
; AVX512-LABEL: @sitofp_uitofp_4i32_8i16_16i8(
; AVX512-NEXT: [[B0:%.]] = extractelement <8 x i16> [[B:%.]], i32 0
; AVX512-NEXT: [[B1:%.*]] = extractelement <8 x i16> [[B]], i32 1
; AVX512-NEXT: [[C0:%.]] = extractelement <16 x i8> [[C:%.]], i32 0
; AVX512-NEXT: [[C1:%.*]] = extractelement <16 x i8> [[C]], i32 1
; AVX512-NEXT: [[TMP1:%.]] = sitofp <4 x i32> [[A:%.]] to <4 x float>
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[A]] to <4 x float>
; AVX512-NEXT: [[AB4:%.*]] = sitofp i16 [[B0]] to float
; AVX512-NEXT: [[AB5:%.*]] = uitofp i16 [[B1]] to float
; AVX512-NEXT: [[AB6:%.*]] = sitofp i8 [[C0]] to float
; AVX512-NEXT: [[AB7:%.*]] = uitofp i8 [[C1]] to float
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 0
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP3]], i32 0
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP4]], i32 1
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP5]], i32 2
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP6]], i32 3
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; AVX512-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <8 x i16> %b, i32 0		%b0 = extractelement <8 x i16> %b, i32 0
%b1 = extractelement <8 x i16> %b, i32 1		%b1 = extractelement <8 x i16> %b, i32 1
%c0 = extractelement <16 x i8> %c, i32 0		%c0 = extractelement <16 x i8> %c, i32 0
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @fadd_fsub_v8f32(		; CHECK-LABEL: @fadd_fsub_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; CHECK-NEXT: ret <8 x float> [[R7]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r4 = insertelement <8 x float> %r3, float %ab4, i32 4		%r4 = insertelement <8 x float> %r3, float %ab4, i32 4
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @fmul_fdiv_v8f32(		; CHECK-LABEL: @fmul_fdiv_v8f32(
; SSE-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; SSE-NEXT: ret <8 x float> [[R7]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;
; SLM-LABEL: @fmul_fdiv_v8f32(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SLM-NEXT: [[B0:%.]] = extractelement <8 x float> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x float> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <8 x float> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <8 x float> [[B]], i32 3
; SLM-NEXT: [[B4:%.*]] = extractelement <8 x float> [[B]], i32 4
; SLM-NEXT: [[B5:%.*]] = extractelement <8 x float> [[B]], i32 5
; SLM-NEXT: [[B6:%.*]] = extractelement <8 x float> [[B]], i32 6
; SLM-NEXT: [[B7:%.*]] = extractelement <8 x float> [[B]], i32 7
; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], [[B0]]
; SLM-NEXT: [[AB1:%.*]] = fdiv float [[A1]], [[B1]]
; SLM-NEXT: [[AB2:%.*]] = fdiv float [[A2]], [[B2]]
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], [[B3]]
; SLM-NEXT: [[AB4:%.*]] = fmul float [[A4]], [[B4]]
; SLM-NEXT: [[AB5:%.*]] = fdiv float [[A5]], [[B5]]
; SLM-NEXT: [[AB6:%.*]] = fdiv float [[A6]], [[B6]]
; SLM-NEXT: [[AB7:%.*]] = fmul float [[A7]], [[B7]]
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]
;
; AVX-LABEL: @fmul_fdiv_v8f32(
; AVX-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; AVX-NEXT: ret <8 x float> [[R7]]
;
; AVX512-LABEL: @fmul_fdiv_v8f32(
; AVX512-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; AVX512-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 31 Lines
; SSE-NEXT: ret <4 x float> [[TMP1]]		; SSE-NEXT: ret <4 x float> [[TMP1]]
;		;
; SLM-LABEL: @fmul_fdiv_v4f32_const(		; SLM-LABEL: @fmul_fdiv_v4f32_const(
; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2		; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>		; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00		; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
; SLM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0		; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP4]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
; SLM-NEXT: ret <4 x float> [[R3]]		; SLM-NEXT: ret <4 x float> [[R3]]
;		;
; AVX-LABEL: @fmul_fdiv_v4f32_const(		; AVX-LABEL: @fmul_fdiv_v4f32_const(
; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>		; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX-NEXT: ret <4 x float> [[TMP1]]		; AVX-NEXT: ret <4 x float> [[TMP1]]
;		;
; AVX512-LABEL: @fmul_fdiv_v4f32_const(		; AVX512-LABEL: @fmul_fdiv_v4f32_const(
Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512

define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @fadd_fsub_v8f32(		; CHECK-LABEL: @fadd_fsub_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; CHECK-NEXT: ret <8 x float> [[R7]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r4 = insertelement <8 x float> %r3, float %ab4, i32 4		%r4 = insertelement <8 x float> %r3, float %ab4, i32 4
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @fmul_fdiv_v8f32(		; CHECK-LABEL: @fmul_fdiv_v8f32(
; SSE-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; SSE-NEXT: ret <8 x float> [[R7]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;
; SLM-LABEL: @fmul_fdiv_v8f32(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
; SLM-NEXT: [[B0:%.]] = extractelement <8 x float> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x float> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <8 x float> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <8 x float> [[B]], i32 3
; SLM-NEXT: [[B4:%.*]] = extractelement <8 x float> [[B]], i32 4
; SLM-NEXT: [[B5:%.*]] = extractelement <8 x float> [[B]], i32 5
; SLM-NEXT: [[B6:%.*]] = extractelement <8 x float> [[B]], i32 6
; SLM-NEXT: [[B7:%.*]] = extractelement <8 x float> [[B]], i32 7
; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], [[B0]]
; SLM-NEXT: [[AB1:%.*]] = fdiv float [[A1]], [[B1]]
; SLM-NEXT: [[AB2:%.*]] = fdiv float [[A2]], [[B2]]
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], [[B3]]
; SLM-NEXT: [[AB4:%.*]] = fmul float [[A4]], [[B4]]
; SLM-NEXT: [[AB5:%.*]] = fdiv float [[A5]], [[B5]]
; SLM-NEXT: [[AB6:%.*]] = fdiv float [[A6]], [[B6]]
; SLM-NEXT: [[AB7:%.*]] = fmul float [[A7]], [[B7]]
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
; SLM-NEXT: ret <8 x float> [[R7]]
;
; AVX-LABEL: @fmul_fdiv_v8f32(
; AVX-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; AVX-NEXT: ret <8 x float> [[R7]]
;
; AVX512-LABEL: @fmul_fdiv_v8f32(
; AVX512-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; AVX512-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 31 Lines
; SSE-NEXT: ret <4 x float> [[TMP1]]		; SSE-NEXT: ret <4 x float> [[TMP1]]
;		;
; SLM-LABEL: @fmul_fdiv_v4f32_const(		; SLM-LABEL: @fmul_fdiv_v4f32_const(
; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2		; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>		; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>		; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00		; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
; SLM-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; SLM-NEXT: [[R11:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0		; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R11]], float [[A2]], i32 2
; SLM-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP4]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
; SLM-NEXT: ret <4 x float> [[R3]]		; SLM-NEXT: ret <4 x float> [[R3]]
;		;
; AVX-LABEL: @fmul_fdiv_v4f32_const(		; AVX-LABEL: @fmul_fdiv_v4f32_const(
; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>		; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX-NEXT: ret <4 x float> [[TMP1]]		; AVX-NEXT: ret <4 x float> [[TMP1]]
;		;
; AVX512-LABEL: @fmul_fdiv_v4f32_const(		; AVX512-LABEL: @fmul_fdiv_v4f32_const(
Show All 17 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @add_sub_v8i32(		; CHECK-LABEL: @add_sub_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = sub <8 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <8 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 24 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <4 x i32> @add_and_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @add_and_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @add_and_v4i32(		; CHECK-LABEL: @add_and_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = and <4 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = and <4 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: ret <4 x i32> [[R3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
%b2 = extractelement <4 x i32> %b, i32 2		%b2 = extractelement <4 x i32> %b, i32 2
%b3 = extractelement <4 x i32> %b, i32 3		%b3 = extractelement <4 x i32> %b, i32 3
%ab0 = add i32 %a0, %b0		%ab0 = add i32 %a0, %b0
%ab1 = add i32 %a1, %b1		%ab1 = add i32 %a1, %b1
%ab2 = and i32 %a2, %b2		%ab2 = and i32 %a2, %b2
%ab3 = and i32 %a3, %b3		%ab3 = and i32 %a3, %b3
%r0 = insertelement <4 x i32> poison, i32 %ab0, i32 0		%r0 = insertelement <4 x i32> poison, i32 %ab0, i32 0
%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1		%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1
%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2		%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2
%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3		%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3
ret <4 x i32> %r3		ret <4 x i32> %r3
}		}

define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @add_mul_v4i32(		; CHECK-LABEL: @add_mul_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: ret <4 x i32> [[R3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
%b2 = extractelement <4 x i32> %b, i32 2		%b2 = extractelement <4 x i32> %b, i32 2
%b3 = extractelement <4 x i32> %b, i32 3		%b3 = extractelement <4 x i32> %b, i32 3
%ab0 = mul i32 %a0, %b0		%ab0 = mul i32 %a0, %b0
%ab1 = add i32 %a1, %b1		%ab1 = add i32 %a1, %b1
%ab2 = add i32 %a2, %b2		%ab2 = add i32 %a2, %b2
%ab3 = mul i32 %a3, %b3		%ab3 = mul i32 %a3, %b3
%r0 = insertelement <4 x i32> poison, i32 %ab0, i32 0		%r0 = insertelement <4 x i32> poison, i32 %ab0, i32 0
%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1		%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1
%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2		%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2
%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3		%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3
ret <4 x i32> %r3		ret <4 x i32> %r3
}		}

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
		; SSE-NEXT: ret <8 x i32> [[R72]]
		;
		; SLM-LABEL: @ashr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: ret <8 x i32> [[TMP3]]
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @ashr_shl_v8i32(		; AVX512-LABEL: @ashr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 26 Lines
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R72]]
		;
		; SLM-LABEL: @ashr_shl_v8i32_const(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
		RKSimonUnsubmitted Done Reply Inline Actions Where did the SSE checks go? Add back the SSE check prefixes? RKSimon: Where did the SSE checks go? Add back the SSE check prefixes?
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Thanks, fixed anton-afanasyev: Thanks, fixed
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32_const(		; AVX2-LABEL: @ashr_shl_v8i32_const(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @ashr_shl_v8i32_const(		; AVX512-LABEL: @ashr_shl_v8i32_const(
; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 16 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; SSE-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_lshr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[R72]]
		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX1-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; AVX1-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; AVX1-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4		; AVX1-NEXT: ret <8 x i32> [[R71]]
; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX2-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX2-NEXT: ret <8 x i32> [[R72]]
; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2
; AVX2-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3
; AVX2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX512-NEXT: ret <8 x i32> [[R72]]
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3
; AVX512-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX512-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
}		}

define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {		define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {
; CHECK-LABEL: @add_sub_v8i32_splat(		; CHECK-LABEL: @add_sub_v8i32_splat(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]		; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP5]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX1
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefixes=CHECK,AVX512

define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @add_sub_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @add_sub_v8i32(		; CHECK-LABEL: @add_sub_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <8 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = sub <8 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <8 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 24 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <4 x i32> @add_and_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @add_and_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @add_and_v4i32(		; CHECK-LABEL: @add_and_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = and <4 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = and <4 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: ret <4 x i32> [[R3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
%b2 = extractelement <4 x i32> %b, i32 2		%b2 = extractelement <4 x i32> %b, i32 2
%b3 = extractelement <4 x i32> %b, i32 3		%b3 = extractelement <4 x i32> %b, i32 3
%ab0 = add i32 %a0, %b0		%ab0 = add i32 %a0, %b0
%ab1 = add i32 %a1, %b1		%ab1 = add i32 %a1, %b1
%ab2 = and i32 %a2, %b2		%ab2 = and i32 %a2, %b2
%ab3 = and i32 %a3, %b3		%ab3 = and i32 %a3, %b3
%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0		%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0
%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1		%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1
%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2		%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2
%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3		%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3
ret <4 x i32> %r3		ret <4 x i32> %r3
}		}

define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @add_mul_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @add_mul_v4i32(		; CHECK-LABEL: @add_mul_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i32> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[A]], [[B]]
; CHECK-NEXT: [[R3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: ret <4 x i32> [[R3]]		; CHECK-NEXT: ret <4 x i32> [[TMP3]]
;		;
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
%b2 = extractelement <4 x i32> %b, i32 2		%b2 = extractelement <4 x i32> %b, i32 2
%b3 = extractelement <4 x i32> %b, i32 3		%b3 = extractelement <4 x i32> %b, i32 3
%ab0 = mul i32 %a0, %b0		%ab0 = mul i32 %a0, %b0
%ab1 = add i32 %a1, %b1		%ab1 = add i32 %a1, %b1
%ab2 = add i32 %a2, %b2		%ab2 = add i32 %a2, %b2
%ab3 = mul i32 %a3, %b3		%ab3 = mul i32 %a3, %b3
%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0		%r0 = insertelement <4 x i32> undef, i32 %ab0, i32 0
%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1		%r1 = insertelement <4 x i32> %r0, i32 %ab1, i32 1
%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2		%r2 = insertelement <4 x i32> %r1, i32 %ab2, i32 2
%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3		%r3 = insertelement <4 x i32> %r2, i32 %ab3, i32 3
ret <4 x i32> %r3		ret <4 x i32> %r3
}		}

define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_shl_v8i32(		; SSE-LABEL: @ashr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; SSE-NEXT: [[R72:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
		; SSE-NEXT: ret <8 x i32> [[R72]]
		;
		; SLM-LABEL: @ashr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32(		; AVX1-LABEL: @ashr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: ret <8 x i32> [[TMP3]]
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 2, i32 3, i32 4, i32 5>
; AVX1-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[TMP7:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[R3:%.*]] = shufflevector <8 x i32> [[R1]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R5:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 11, i32 undef, i32 undef>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R5]], <8 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 14, i32 15>
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32(		; AVX2-LABEL: @ashr_shl_v8i32(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @ashr_shl_v8i32(		; AVX512-LABEL: @ashr_shl_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 26 Lines
}		}

define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {		define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
; SSE-LABEL: @ashr_shl_v8i32_const(		; SSE-LABEL: @ashr_shl_v8i32_const(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; SSE-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; SSE-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SSE-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R72]]
		;
		; SLM-LABEL: @ashr_shl_v8i32_const(
		; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX1-LABEL: @ashr_shl_v8i32_const(		; AVX1-LABEL: @ashr_shl_v8i32_const(
; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>		; AVX1-NEXT: [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
; AVX1-NEXT: [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; AVX1-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX1-NEXT: ret <8 x i32> [[R7]]		; AVX1-NEXT: ret <8 x i32> [[R72]]
;		;
; AVX2-LABEL: @ashr_shl_v8i32_const(		; AVX2-LABEL: @ashr_shl_v8i32_const(
; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX2-NEXT: ret <8 x i32> [[R7]]		; AVX2-NEXT: ret <8 x i32> [[TMP3]]
;		;
; AVX512-LABEL: @ashr_shl_v8i32_const(		; AVX512-LABEL: @ashr_shl_v8i32_const(
; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>		; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], <i32 2, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 3>
; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; AVX512-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; AVX512-NEXT: ret <8 x i32> [[R7]]		; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 16 Lines	;
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
		; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
		; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
		; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
		; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
		; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
		; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
		; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
		; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; SSE-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
		; SSE-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
		; SSE-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
		; SSE-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; SSE-NEXT: ret <8 x i32> [[R7]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
		; SLM-LABEL: @ashr_lshr_shl_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
		; SLM-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
		; SLM-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[R72]]
		;
; AVX1-LABEL: @ashr_lshr_shl_v8i32(		; AVX1-LABEL: @ashr_lshr_shl_v8i32(
; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0		; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1		; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6		; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0		; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1		; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6		; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7		; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]		; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]		; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX1-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX1-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX1-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; AVX1-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP4:%.*]] = shl <4 x i32> [[TMP1]], [[TMP2]]
		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0		; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1		; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
; AVX1-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2		; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2		; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
; AVX1-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3		; AVX1-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3		; AVX1-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; AVX1-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4		; AVX1-NEXT: ret <8 x i32> [[R71]]
; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
; AVX1-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX1-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX2-LABEL: @ashr_lshr_shl_v8i32(		; AVX2-LABEL: @ashr_lshr_shl_v8i32(
; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX2-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX2-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX2-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX2-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX2-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0		; AVX2-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX2-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX2-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX2-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX2-NEXT: ret <8 x i32> [[R72]]
; AVX2-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2
; AVX2-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3
; AVX2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX2-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX2-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX2-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX2-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX2-NEXT: ret <8 x i32> [[R7]]
;		;
; AVX512-LABEL: @ashr_lshr_shl_v8i32(		; AVX512-LABEL: @ashr_lshr_shl_v8i32(
; AVX512-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7		; AVX512-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
; AVX512-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
; AVX512-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[A]], [[B]]		; AVX512-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]		; AVX512-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]		; AVX512-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX512-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0		; AVX512-NEXT: [[TMP9:%.*]] = shl <4 x i32> [[TMP6]], [[TMP7]]
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1		; AVX512-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[TMP7]], i32 1		; AVX512-NEXT: [[R72:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2		; AVX512-NEXT: ret <8 x i32> [[R72]]
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP8]], i32 2
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP9]], i32 3
; AVX512-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP5]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP10]], i32 4
; AVX512-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP5]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP11]], i32 5
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
; AVX512-NEXT: ret <8 x i32> [[R7]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
}		}

define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {		define <8 x i32> @add_sub_v8i32_splat(<8 x i32> %a, i32 %b) {
; CHECK-LABEL: @add_sub_v8i32_splat(		; CHECK-LABEL: @add_sub_v8i32_splat(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[B:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]		; CHECK-NEXT: [[TMP3:%.]] = add <8 x i32> [[TMP2]], [[A:%.]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]		; CHECK-NEXT: [[TMP4:%.*]] = sub <8 x i32> [[TMP2]], [[A]]
; CHECK-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: ret <8 x i32> [[R7]]		; CHECK-NEXT: ret <8 x i32> [[TMP5]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX512

;		;
; 128-bit Vectors		; 128-bit Vectors
;		;

define <2 x double> @buildvector_add_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_add_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_add_2f64(		; CHECK-LABEL: @buildvector_add_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fadd double %a0, %b0		%c0 = fadd double %a0, %b0
%c1 = fadd double %a1, %b1		%c1 = fadd double %a1, %b1
%r0 = insertelement <2 x double> poison, double %c0, i32 0		%r0 = insertelement <2 x double> poison, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_sub_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_sub_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_sub_2f64(		; CHECK-LABEL: @buildvector_sub_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fsub double %a0, %b0		%c0 = fsub double %a0, %b0
%c1 = fsub double %a1, %b1		%c1 = fsub double %a1, %b1
%r0 = insertelement <2 x double> poison, double %c0, i32 0		%r0 = insertelement <2 x double> poison, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_mul_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_mul_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_mul_2f64(		; CHECK-LABEL: @buildvector_mul_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fmul double %a0, %b0		%c0 = fmul double %a0, %b0
%c1 = fmul double %a1, %b1		%c1 = fmul double %a1, %b1
%r0 = insertelement <2 x double> poison, double %c0, i32 0		%r0 = insertelement <2 x double> poison, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_div_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_div_2f64(<2 x double> %a, <2 x double> %b) {
; SSE-LABEL: @buildvector_div_2f64(		; SSE-LABEL: @buildvector_div_2f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <2 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: ret <2 x double> [[R1]]
;		;
; SLM-LABEL: @buildvector_div_2f64(		; SLM-LABEL: @buildvector_div_2f64(
; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
; SLM-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0		; SLM-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[C1]], i32 1
; SLM-NEXT: ret <2 x double> [[R1]]		; SLM-NEXT: ret <2 x double> [[R1]]
;		;
; AVX-LABEL: @buildvector_div_2f64(		; AVX-LABEL: @buildvector_div_2f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <2 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: ret <2 x double> [[R1]]
;		;
; AVX512-LABEL: @buildvector_div_2f64(		; AVX512-LABEL: @buildvector_div_2f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <2 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fdiv double %a0, %b0		%c0 = fdiv double %a0, %b0
%c1 = fdiv double %a1, %b1		%c1 = fdiv double %a1, %b1
%r0 = insertelement <2 x double> poison, double %c0, i32 0		%r0 = insertelement <2 x double> poison, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <4 x float> @buildvector_add_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_add_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_add_4f32(		; CHECK-LABEL: @buildvector_add_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%c2 = fadd float %a2, %b2		%c2 = fadd float %a2, %b2
%c3 = fadd float %a3, %b3		%c3 = fadd float %a3, %b3
%r0 = insertelement <4 x float> poison, float %c0, i32 0		%r0 = insertelement <4 x float> poison, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_sub_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_sub_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_sub_4f32(		; CHECK-LABEL: @buildvector_sub_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fsub float %a0, %b0		%c0 = fsub float %a0, %b0
%c1 = fsub float %a1, %b1		%c1 = fsub float %a1, %b1
%c2 = fsub float %a2, %b2		%c2 = fsub float %a2, %b2
%c3 = fsub float %a3, %b3		%c3 = fsub float %a3, %b3
%r0 = insertelement <4 x float> poison, float %c0, i32 0		%r0 = insertelement <4 x float> poison, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_mul_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_mul_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_mul_4f32(		; CHECK-LABEL: @buildvector_mul_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fmul float %a0, %b0		%c0 = fmul float %a0, %b0
%c1 = fmul float %a1, %b1		%c1 = fmul float %a1, %b1
%c2 = fmul float %a2, %b2		%c2 = fmul float %a2, %b2
%c3 = fmul float %a3, %b3		%c3 = fmul float %a3, %b3
%r0 = insertelement <4 x float> poison, float %c0, i32 0		%r0 = insertelement <4 x float> poison, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_div_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_div_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_div_4f32(		; CHECK-LABEL: @buildvector_div_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
Show All 11 Lines

;		;
; 256-bit Vectors		; 256-bit Vectors
;		;

define <4 x double> @buildvector_add_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_add_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_add_4f64(		; CHECK-LABEL: @buildvector_add_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fadd double %a0, %b0		%c0 = fadd double %a0, %b0
%c1 = fadd double %a1, %b1		%c1 = fadd double %a1, %b1
%c2 = fadd double %a2, %b2		%c2 = fadd double %a2, %b2
%c3 = fadd double %a3, %b3		%c3 = fadd double %a3, %b3
%r0 = insertelement <4 x double> poison, double %c0, i32 0		%r0 = insertelement <4 x double> poison, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_sub_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_sub_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_sub_4f64(		; CHECK-LABEL: @buildvector_sub_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fsub double %a0, %b0		%c0 = fsub double %a0, %b0
%c1 = fsub double %a1, %b1		%c1 = fsub double %a1, %b1
%c2 = fsub double %a2, %b2		%c2 = fsub double %a2, %b2
%c3 = fsub double %a3, %b3		%c3 = fsub double %a3, %b3
%r0 = insertelement <4 x double> poison, double %c0, i32 0		%r0 = insertelement <4 x double> poison, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_mul_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_mul_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_mul_4f64(		; CHECK-LABEL: @buildvector_mul_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fmul double %a0, %b0		%c0 = fmul double %a0, %b0
%c1 = fmul double %a1, %b1		%c1 = fmul double %a1, %b1
%c2 = fmul double %a2, %b2		%c2 = fmul double %a2, %b2
%c3 = fmul double %a3, %b3		%c3 = fmul double %a3, %b3
%r0 = insertelement <4 x double> poison, double %c0, i32 0		%r0 = insertelement <4 x double> poison, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_div_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_div_4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @buildvector_div_4f64(		; SSE-LABEL: @buildvector_div_4f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <4 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; SSE-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; SSE-NEXT: ret <4 x double> [[R3]]
;		;
; SLM-LABEL: @buildvector_div_4f64(		; SLM-LABEL: @buildvector_div_4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]		; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]		; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[C0]], i32 0		; SLM-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[C0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[C1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[C2]], i32 2		; SLM-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[C2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[C3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[C3]], i32 3
; SLM-NEXT: ret <4 x double> [[R3]]		; SLM-NEXT: ret <4 x double> [[R3]]
;		;
; AVX-LABEL: @buildvector_div_4f64(		; AVX-LABEL: @buildvector_div_4f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <4 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; AVX-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; AVX-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; AVX-NEXT: ret <4 x double> [[R3]]
;		;
; AVX512-LABEL: @buildvector_div_4f64(		; AVX512-LABEL: @buildvector_div_4f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <4 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <4 x double> poison, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; AVX512-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fdiv double %a0, %b0		%c0 = fdiv double %a0, %b0
%c1 = fdiv double %a1, %b1		%c1 = fdiv double %a1, %b1
%c2 = fdiv double %a2, %b2		%c2 = fdiv double %a2, %b2
%c3 = fdiv double %a3, %b3		%c3 = fdiv double %a3, %b3
%r0 = insertelement <4 x double> poison, double %c0, i32 0		%r0 = insertelement <4 x double> poison, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <8 x float> @buildvector_add_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_add_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_add_8f32(		; CHECK-LABEL: @buildvector_add_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_sub_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_sub_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_sub_8f32(		; CHECK-LABEL: @buildvector_sub_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_mul_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_mul_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_mul_8f32(		; CHECK-LABEL: @buildvector_mul_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_div_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_div_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_div_8f32(		; CHECK-LABEL: @buildvector_div_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 27 Lines

;		;
; 512-bit Vectors		; 512-bit Vectors
;		;

define <8 x double> @buildvector_add_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_add_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_add_8f64(		; CHECK-LABEL: @buildvector_add_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_sub_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_sub_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_sub_8f64(		; CHECK-LABEL: @buildvector_sub_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_mul_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_mul_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_mul_8f64(		; CHECK-LABEL: @buildvector_mul_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-LABEL: @buildvector_div_8f64(		; SSE-LABEL: @buildvector_div_8f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <8 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; SSE-NEXT: ret <8 x double> [[R7]]
;		;
; SLM-LABEL: @buildvector_div_8f64(		; SLM-LABEL: @buildvector_div_8f64(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4		; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2		; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3		; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4		; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5		; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6		; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7		; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1
; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]		; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]		; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]		; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]		; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]		; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]		; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[C0]], i32 0		; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2		; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3		; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4		; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5		; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6		; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7		; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
; SLM-NEXT: ret <8 x double> [[R7]]		; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
		; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
		; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
		; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
		; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R11:%.*]] = shufflevector <8 x double> poison, <8 x double> [[TMP21]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R32:%.*]] = shufflevector <8 x double> [[R11]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R53:%.*]] = shufflevector <8 x double> [[R32]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
		; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R74:%.*]] = shufflevector <8 x double> [[R53]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; SLM-NEXT: ret <8 x double> [[R74]]
;		;
; AVX-LABEL: @buildvector_div_8f64(		; AVX-LABEL: @buildvector_div_8f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <8 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; AVX-NEXT: ret <8 x double> [[R7]]
;		;
; AVX512-LABEL: @buildvector_div_8f64(		; AVX512-LABEL: @buildvector_div_8f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <8 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x double> poison, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; AVX512-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <16 x float> @buildvector_add_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_add_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_add_16f32(		; CHECK-LABEL: @buildvector_add_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_sub_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_sub_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_sub_16f32(		; CHECK-LABEL: @buildvector_sub_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_mul_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_mul_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_mul_16f32(		; CHECK-LABEL: @buildvector_mul_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_div_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_div_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_div_16f32(		; CHECK-LABEL: @buildvector_div_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX512

;		;
; 128-bit Vectors		; 128-bit Vectors
;		;

define <2 x double> @buildvector_add_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_add_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_add_2f64(		; CHECK-LABEL: @buildvector_add_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fadd double %a0, %b0		%c0 = fadd double %a0, %b0
%c1 = fadd double %a1, %b1		%c1 = fadd double %a1, %b1
%r0 = insertelement <2 x double> undef, double %c0, i32 0		%r0 = insertelement <2 x double> undef, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_sub_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_sub_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_sub_2f64(		; CHECK-LABEL: @buildvector_sub_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fsub double %a0, %b0		%c0 = fsub double %a0, %b0
%c1 = fsub double %a1, %b1		%c1 = fsub double %a1, %b1
%r0 = insertelement <2 x double> undef, double %c0, i32 0		%r0 = insertelement <2 x double> undef, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_mul_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_mul_2f64(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @buildvector_mul_2f64(		; CHECK-LABEL: @buildvector_mul_2f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <2 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <2 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fmul double %a0, %b0		%c0 = fmul double %a0, %b0
%c1 = fmul double %a1, %b1		%c1 = fmul double %a1, %b1
%r0 = insertelement <2 x double> undef, double %c0, i32 0		%r0 = insertelement <2 x double> undef, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <2 x double> @buildvector_div_2f64(<2 x double> %a, <2 x double> %b) {		define <2 x double> @buildvector_div_2f64(<2 x double> %a, <2 x double> %b) {
; SSE-LABEL: @buildvector_div_2f64(		; SSE-LABEL: @buildvector_div_2f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <2 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: ret <2 x double> [[R1]]
;		;
; SLM-LABEL: @buildvector_div_2f64(		; SLM-LABEL: @buildvector_div_2f64(
; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
; SLM-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[C0]], i32 0		; SLM-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[C0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[C1]], i32 1
; SLM-NEXT: ret <2 x double> [[R1]]		; SLM-NEXT: ret <2 x double> [[R1]]
;		;
; AVX-LABEL: @buildvector_div_2f64(		; AVX-LABEL: @buildvector_div_2f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <2 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: ret <2 x double> [[R1]]
;		;
; AVX512-LABEL: @buildvector_div_2f64(		; AVX512-LABEL: @buildvector_div_2f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <2 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <2 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <2 x double> undef, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <2 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: ret <2 x double> [[R1]]
;		;
%a0 = extractelement <2 x double> %a, i32 0		%a0 = extractelement <2 x double> %a, i32 0
%a1 = extractelement <2 x double> %a, i32 1		%a1 = extractelement <2 x double> %a, i32 1
%b0 = extractelement <2 x double> %b, i32 0		%b0 = extractelement <2 x double> %b, i32 0
%b1 = extractelement <2 x double> %b, i32 1		%b1 = extractelement <2 x double> %b, i32 1
%c0 = fdiv double %a0, %b0		%c0 = fdiv double %a0, %b0
%c1 = fdiv double %a1, %b1		%c1 = fdiv double %a1, %b1
%r0 = insertelement <2 x double> undef, double %c0, i32 0		%r0 = insertelement <2 x double> undef, double %c0, i32 0
%r1 = insertelement <2 x double> %r0, double %c1, i32 1		%r1 = insertelement <2 x double> %r0, double %c1, i32 1
ret <2 x double> %r1		ret <2 x double> %r1
}		}

define <4 x float> @buildvector_add_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_add_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_add_4f32(		; CHECK-LABEL: @buildvector_add_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%c2 = fadd float %a2, %b2		%c2 = fadd float %a2, %b2
%c3 = fadd float %a3, %b3		%c3 = fadd float %a3, %b3
%r0 = insertelement <4 x float> undef, float %c0, i32 0		%r0 = insertelement <4 x float> undef, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_sub_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_sub_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_sub_4f32(		; CHECK-LABEL: @buildvector_sub_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fsub float %a0, %b0		%c0 = fsub float %a0, %b0
%c1 = fsub float %a1, %b1		%c1 = fsub float %a1, %b1
%c2 = fsub float %a2, %b2		%c2 = fsub float %a2, %b2
%c3 = fsub float %a3, %b3		%c3 = fsub float %a3, %b3
%r0 = insertelement <4 x float> undef, float %c0, i32 0		%r0 = insertelement <4 x float> undef, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_mul_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_mul_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_mul_4f32(		; CHECK-LABEL: @buildvector_mul_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
%b3 = extractelement <4 x float> %b, i32 3		%b3 = extractelement <4 x float> %b, i32 3
%c0 = fmul float %a0, %b0		%c0 = fmul float %a0, %b0
%c1 = fmul float %a1, %b1		%c1 = fmul float %a1, %b1
%c2 = fmul float %a2, %b2		%c2 = fmul float %a2, %b2
%c3 = fmul float %a3, %b3		%c3 = fmul float %a3, %b3
%r0 = insertelement <4 x float> undef, float %c0, i32 0		%r0 = insertelement <4 x float> undef, float %c0, i32 0
%r1 = insertelement <4 x float> %r0, float %c1, i32 1		%r1 = insertelement <4 x float> %r0, float %c1, i32 1
%r2 = insertelement <4 x float> %r1, float %c2, i32 2		%r2 = insertelement <4 x float> %r1, float %c2, i32 2
%r3 = insertelement <4 x float> %r2, float %c3, i32 3		%r3 = insertelement <4 x float> %r2, float %c3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

define <4 x float> @buildvector_div_4f32(<4 x float> %a, <4 x float> %b) {		define <4 x float> @buildvector_div_4f32(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @buildvector_div_4f32(		; CHECK-LABEL: @buildvector_div_4f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[R3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%b2 = extractelement <4 x float> %b, i32 2		%b2 = extractelement <4 x float> %b, i32 2
Show All 11 Lines

;		;
; 256-bit Vectors		; 256-bit Vectors
;		;

define <4 x double> @buildvector_add_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_add_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_add_4f64(		; CHECK-LABEL: @buildvector_add_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fadd double %a0, %b0		%c0 = fadd double %a0, %b0
%c1 = fadd double %a1, %b1		%c1 = fadd double %a1, %b1
%c2 = fadd double %a2, %b2		%c2 = fadd double %a2, %b2
%c3 = fadd double %a3, %b3		%c3 = fadd double %a3, %b3
%r0 = insertelement <4 x double> undef, double %c0, i32 0		%r0 = insertelement <4 x double> undef, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_sub_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_sub_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_sub_4f64(		; CHECK-LABEL: @buildvector_sub_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fsub double %a0, %b0		%c0 = fsub double %a0, %b0
%c1 = fsub double %a1, %b1		%c1 = fsub double %a1, %b1
%c2 = fsub double %a2, %b2		%c2 = fsub double %a2, %b2
%c3 = fsub double %a3, %b3		%c3 = fsub double %a3, %b3
%r0 = insertelement <4 x double> undef, double %c0, i32 0		%r0 = insertelement <4 x double> undef, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_mul_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_mul_4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @buildvector_mul_4f64(		; CHECK-LABEL: @buildvector_mul_4f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fmul double %a0, %b0		%c0 = fmul double %a0, %b0
%c1 = fmul double %a1, %b1		%c1 = fmul double %a1, %b1
%c2 = fmul double %a2, %b2		%c2 = fmul double %a2, %b2
%c3 = fmul double %a3, %b3		%c3 = fmul double %a3, %b3
%r0 = insertelement <4 x double> undef, double %c0, i32 0		%r0 = insertelement <4 x double> undef, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <4 x double> @buildvector_div_4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @buildvector_div_4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @buildvector_div_4f64(		; SSE-LABEL: @buildvector_div_4f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <4 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; SSE-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; SSE-NEXT: ret <4 x double> [[R3]]
;		;
; SLM-LABEL: @buildvector_div_4f64(		; SLM-LABEL: @buildvector_div_4f64(
; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3
; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]
; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]		; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]
; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]		; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]
; SLM-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[C0]], i32 0		; SLM-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[C0]], i32 0
; SLM-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[C1]], i32 1
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[C2]], i32 2		; SLM-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[C2]], i32 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[C3]], i32 3		; SLM-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[C3]], i32 3
; SLM-NEXT: ret <4 x double> [[R3]]		; SLM-NEXT: ret <4 x double> [[R3]]
;		;
; AVX-LABEL: @buildvector_div_4f64(		; AVX-LABEL: @buildvector_div_4f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <4 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; AVX-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; AVX-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; AVX-NEXT: ret <4 x double> [[R3]]
;		;
; AVX512-LABEL: @buildvector_div_4f64(		; AVX512-LABEL: @buildvector_div_4f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <4 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <4 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <4 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <4 x double> undef, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <4 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <4 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <4 x double> [[TMP1]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <4 x double> [[R1]], double [[TMP4]], i32 2
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <4 x double> [[TMP1]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <4 x double> [[R2]], double [[TMP5]], i32 3
; AVX512-NEXT: ret <4 x double> [[R3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%c0 = fdiv double %a0, %b0		%c0 = fdiv double %a0, %b0
%c1 = fdiv double %a1, %b1		%c1 = fdiv double %a1, %b1
%c2 = fdiv double %a2, %b2		%c2 = fdiv double %a2, %b2
%c3 = fdiv double %a3, %b3		%c3 = fdiv double %a3, %b3
%r0 = insertelement <4 x double> undef, double %c0, i32 0		%r0 = insertelement <4 x double> undef, double %c0, i32 0
%r1 = insertelement <4 x double> %r0, double %c1, i32 1		%r1 = insertelement <4 x double> %r0, double %c1, i32 1
%r2 = insertelement <4 x double> %r1, double %c2, i32 2		%r2 = insertelement <4 x double> %r1, double %c2, i32 2
%r3 = insertelement <4 x double> %r2, double %c3, i32 3		%r3 = insertelement <4 x double> %r2, double %c3, i32 3
ret <4 x double> %r3		ret <4 x double> %r3
}		}

define <8 x float> @buildvector_add_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_add_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_add_8f32(		; CHECK-LABEL: @buildvector_add_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_sub_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_sub_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_sub_8f32(		; CHECK-LABEL: @buildvector_sub_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_mul_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_mul_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_mul_8f32(		; CHECK-LABEL: @buildvector_mul_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x float> %r5, float %c6, i32 6		%r6 = insertelement <8 x float> %r5, float %c6, i32 6
%r7 = insertelement <8 x float> %r6, float %c7, i32 7		%r7 = insertelement <8 x float> %r6, float %c7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <8 x float> @buildvector_div_8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @buildvector_div_8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @buildvector_div_8f32(		; CHECK-LABEL: @buildvector_div_8f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[R7]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 27 Lines

;		;
; 512-bit Vectors		; 512-bit Vectors
;		;

define <8 x double> @buildvector_add_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_add_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_add_8f64(		; CHECK-LABEL: @buildvector_add_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_sub_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_sub_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_sub_8f64(		; CHECK-LABEL: @buildvector_sub_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_mul_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_mul_8f64(<8 x double> %a, <8 x double> %b) {
; CHECK-LABEL: @buildvector_mul_8f64(		; CHECK-LABEL: @buildvector_mul_8f64(
; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x double> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <8 x double> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x double> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {		define <8 x double> @buildvector_div_8f64(<8 x double> %a, <8 x double> %b) {
; SSE-LABEL: @buildvector_div_8f64(		; SSE-LABEL: @buildvector_div_8f64(
; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; SSE-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; SSE-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; SSE-NEXT: ret <8 x double> [[TMP1]]
; SSE-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; SSE-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; SSE-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; SSE-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; SSE-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; SSE-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; SSE-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; SSE-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; SSE-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; SSE-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; SSE-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; SSE-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; SSE-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; SSE-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; SSE-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; SSE-NEXT: ret <8 x double> [[R7]]
;		;
; SLM-LABEL: @buildvector_div_8f64(		; SLM-LABEL: @buildvector_div_8f64(
; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0		; SLM-NEXT: [[A0:%.]] = extractelement <8 x double> [[A:%.]], i32 0
; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1		; SLM-NEXT: [[A1:%.*]] = extractelement <8 x double> [[A]], i32 1
; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2		; SLM-NEXT: [[A2:%.*]] = extractelement <8 x double> [[A]], i32 2
; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3		; SLM-NEXT: [[A3:%.*]] = extractelement <8 x double> [[A]], i32 3
; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4		; SLM-NEXT: [[A4:%.*]] = extractelement <8 x double> [[A]], i32 4
; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5		; SLM-NEXT: [[A5:%.*]] = extractelement <8 x double> [[A]], i32 5
; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6		; SLM-NEXT: [[A6:%.*]] = extractelement <8 x double> [[A]], i32 6
; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7		; SLM-NEXT: [[A7:%.*]] = extractelement <8 x double> [[A]], i32 7
; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0		; SLM-NEXT: [[B0:%.]] = extractelement <8 x double> [[B:%.]], i32 0
; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1		; SLM-NEXT: [[B1:%.*]] = extractelement <8 x double> [[B]], i32 1
; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2		; SLM-NEXT: [[B2:%.*]] = extractelement <8 x double> [[B]], i32 2
; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3		; SLM-NEXT: [[B3:%.*]] = extractelement <8 x double> [[B]], i32 3
; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4		; SLM-NEXT: [[B4:%.*]] = extractelement <8 x double> [[B]], i32 4
; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5		; SLM-NEXT: [[B5:%.*]] = extractelement <8 x double> [[B]], i32 5
; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6		; SLM-NEXT: [[B6:%.*]] = extractelement <8 x double> [[B]], i32 6
; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7		; SLM-NEXT: [[B7:%.*]] = extractelement <8 x double> [[B]], i32 7
; SLM-NEXT: [[C0:%.*]] = fdiv double [[A0]], [[B0]]		; SLM-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
; SLM-NEXT: [[C1:%.*]] = fdiv double [[A1]], [[B1]]		; SLM-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[A1]], i32 1
; SLM-NEXT: [[C2:%.*]] = fdiv double [[A2]], [[B2]]		; SLM-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
; SLM-NEXT: [[C3:%.*]] = fdiv double [[A3]], [[B3]]		; SLM-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B1]], i32 1
; SLM-NEXT: [[C4:%.*]] = fdiv double [[A4]], [[B4]]		; SLM-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP2]], [[TMP4]]
; SLM-NEXT: [[C5:%.*]] = fdiv double [[A5]], [[B5]]		; SLM-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[A2]], i32 0
; SLM-NEXT: [[C6:%.*]] = fdiv double [[A6]], [[B6]]		; SLM-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A3]], i32 1
; SLM-NEXT: [[C7:%.*]] = fdiv double [[A7]], [[B7]]		; SLM-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[B2]], i32 0
; SLM-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[C0]], i32 0		; SLM-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B3]], i32 1
; SLM-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[C1]], i32 1		; SLM-NEXT: [[TMP10:%.*]] = fdiv <2 x double> [[TMP7]], [[TMP9]]
; SLM-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[C2]], i32 2		; SLM-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[A4]], i32 0
; SLM-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[C3]], i32 3		; SLM-NEXT: [[TMP12:%.*]] = insertelement <2 x double> [[TMP11]], double [[A5]], i32 1
; SLM-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[C4]], i32 4		; SLM-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[B4]], i32 0
; SLM-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[C5]], i32 5		; SLM-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[B5]], i32 1
; SLM-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[C6]], i32 6		; SLM-NEXT: [[TMP15:%.*]] = fdiv <2 x double> [[TMP12]], [[TMP14]]
; SLM-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[C7]], i32 7		; SLM-NEXT: [[TMP16:%.*]] = insertelement <2 x double> poison, double [[A6]], i32 0
; SLM-NEXT: ret <8 x double> [[R7]]		; SLM-NEXT: [[TMP17:%.*]] = insertelement <2 x double> [[TMP16]], double [[A7]], i32 1
		; SLM-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[B6]], i32 0
		; SLM-NEXT: [[TMP19:%.*]] = insertelement <2 x double> [[TMP18]], double [[B7]], i32 1
		; SLM-NEXT: [[TMP20:%.*]] = fdiv <2 x double> [[TMP17]], [[TMP19]]
		; SLM-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R11:%.*]] = shufflevector <8 x double> undef, <8 x double> [[TMP21]], <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP22:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R32:%.*]] = shufflevector <8 x double> [[R11]], <8 x double> [[TMP22]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: [[TMP23:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R53:%.*]] = shufflevector <8 x double> [[R32]], <8 x double> [[TMP23]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
		; SLM-NEXT: [[TMP24:%.*]] = shufflevector <2 x double> [[TMP20]], <2 x double> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; SLM-NEXT: [[R74:%.*]] = shufflevector <8 x double> [[R53]], <8 x double> [[TMP24]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
		; SLM-NEXT: ret <8 x double> [[R74]]
;		;
; AVX-LABEL: @buildvector_div_8f64(		; AVX-LABEL: @buildvector_div_8f64(
; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; AVX-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; AVX-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; AVX-NEXT: ret <8 x double> [[TMP1]]
; AVX-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; AVX-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; AVX-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; AVX-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; AVX-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; AVX-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; AVX-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; AVX-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; AVX-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; AVX-NEXT: ret <8 x double> [[R7]]
;		;
; AVX512-LABEL: @buildvector_div_8f64(		; AVX512-LABEL: @buildvector_div_8f64(
; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]		; AVX512-NEXT: [[TMP1:%.]] = fdiv <8 x double> [[A:%.]], [[B:%.*]]
; AVX512-NEXT: [[TMP2:%.*]] = extractelement <8 x double> [[TMP1]], i32 0		; AVX512-NEXT: ret <8 x double> [[TMP1]]
; AVX512-NEXT: [[R0:%.*]] = insertelement <8 x double> undef, double [[TMP2]], i32 0
; AVX512-NEXT: [[TMP3:%.*]] = extractelement <8 x double> [[TMP1]], i32 1
; AVX512-NEXT: [[R1:%.*]] = insertelement <8 x double> [[R0]], double [[TMP3]], i32 1
; AVX512-NEXT: [[TMP4:%.*]] = extractelement <8 x double> [[TMP1]], i32 2
; AVX512-NEXT: [[R2:%.*]] = insertelement <8 x double> [[R1]], double [[TMP4]], i32 2
; AVX512-NEXT: [[TMP5:%.*]] = extractelement <8 x double> [[TMP1]], i32 3
; AVX512-NEXT: [[R3:%.*]] = insertelement <8 x double> [[R2]], double [[TMP5]], i32 3
; AVX512-NEXT: [[TMP6:%.*]] = extractelement <8 x double> [[TMP1]], i32 4
; AVX512-NEXT: [[R4:%.*]] = insertelement <8 x double> [[R3]], double [[TMP6]], i32 4
; AVX512-NEXT: [[TMP7:%.*]] = extractelement <8 x double> [[TMP1]], i32 5
; AVX512-NEXT: [[R5:%.*]] = insertelement <8 x double> [[R4]], double [[TMP7]], i32 5
; AVX512-NEXT: [[TMP8:%.*]] = extractelement <8 x double> [[TMP1]], i32 6
; AVX512-NEXT: [[R6:%.*]] = insertelement <8 x double> [[R5]], double [[TMP8]], i32 6
; AVX512-NEXT: [[TMP9:%.*]] = extractelement <8 x double> [[TMP1]], i32 7
; AVX512-NEXT: [[R7:%.*]] = insertelement <8 x double> [[R6]], double [[TMP9]], i32 7
; AVX512-NEXT: ret <8 x double> [[R7]]
;		;
%a0 = extractelement <8 x double> %a, i32 0		%a0 = extractelement <8 x double> %a, i32 0
%a1 = extractelement <8 x double> %a, i32 1		%a1 = extractelement <8 x double> %a, i32 1
%a2 = extractelement <8 x double> %a, i32 2		%a2 = extractelement <8 x double> %a, i32 2
%a3 = extractelement <8 x double> %a, i32 3		%a3 = extractelement <8 x double> %a, i32 3
%a4 = extractelement <8 x double> %a, i32 4		%a4 = extractelement <8 x double> %a, i32 4
%a5 = extractelement <8 x double> %a, i32 5		%a5 = extractelement <8 x double> %a, i32 5
%a6 = extractelement <8 x double> %a, i32 6		%a6 = extractelement <8 x double> %a, i32 6
Show All 23 Lines	;
%r6 = insertelement <8 x double> %r5, double %c6, i32 6		%r6 = insertelement <8 x double> %r5, double %c6, i32 6
%r7 = insertelement <8 x double> %r6, double %c7, i32 7		%r7 = insertelement <8 x double> %r6, double %c7, i32 7
ret <8 x double> %r7		ret <8 x double> %r7
}		}

define <16 x float> @buildvector_add_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_add_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_add_16f32(		; CHECK-LABEL: @buildvector_add_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_sub_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_sub_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_sub_16f32(		; CHECK-LABEL: @buildvector_sub_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fsub <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fsub <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_mul_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_mul_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_mul_16f32(		; CHECK-LABEL: @buildvector_mul_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fmul <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fmul <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%r14 = insertelement <16 x float> %r13, float %c14, i32 14		%r14 = insertelement <16 x float> %r13, float %c14, i32 14
%r15 = insertelement <16 x float> %r14, float %c15, i32 15		%r15 = insertelement <16 x float> %r14, float %c15, i32 15
ret <16 x float> %r15		ret <16 x float> %r15
}		}

define <16 x float> @buildvector_div_16f32(<16 x float> %a, <16 x float> %b) {		define <16 x float> @buildvector_div_16f32(<16 x float> %a, <16 x float> %b) {
; CHECK-LABEL: @buildvector_div_16f32(		; CHECK-LABEL: @buildvector_div_16f32(
; CHECK-NEXT: [[TMP1:%.]] = fdiv <16 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fdiv <16 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <16 x float> [[TMP1]]
; CHECK-NEXT: [[R0:%.*]] = insertelement <16 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[R1:%.*]] = insertelement <16 x float> [[R0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[R2:%.*]] = insertelement <16 x float> [[R1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[R3:%.*]] = insertelement <16 x float> [[R2]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[R4:%.*]] = insertelement <16 x float> [[R3]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <16 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[R5:%.*]] = insertelement <16 x float> [[R4]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[R6:%.*]] = insertelement <16 x float> [[R5]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <16 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[R7:%.*]] = insertelement <16 x float> [[R6]], float [[TMP9]], i32 7
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <16 x float> [[TMP1]], i32 8
; CHECK-NEXT: [[R8:%.*]] = insertelement <16 x float> [[R7]], float [[TMP10]], i32 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <16 x float> [[TMP1]], i32 9
; CHECK-NEXT: [[R9:%.*]] = insertelement <16 x float> [[R8]], float [[TMP11]], i32 9
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <16 x float> [[TMP1]], i32 10
; CHECK-NEXT: [[R10:%.*]] = insertelement <16 x float> [[R9]], float [[TMP12]], i32 10
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <16 x float> [[TMP1]], i32 11
; CHECK-NEXT: [[R11:%.*]] = insertelement <16 x float> [[R10]], float [[TMP13]], i32 11
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <16 x float> [[TMP1]], i32 12
; CHECK-NEXT: [[R12:%.*]] = insertelement <16 x float> [[R11]], float [[TMP14]], i32 12
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x float> [[TMP1]], i32 13
; CHECK-NEXT: [[R13:%.*]] = insertelement <16 x float> [[R12]], float [[TMP15]], i32 13
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <16 x float> [[TMP1]], i32 14
; CHECK-NEXT: [[R14:%.*]] = insertelement <16 x float> [[R13]], float [[TMP16]], i32 14
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <16 x float> [[TMP1]], i32 15
; CHECK-NEXT: [[R15:%.*]] = insertelement <16 x float> [[R14]], float [[TMP17]], i32 15
; CHECK-NEXT: ret <16 x float> [[R15]]
;		;
%a0 = extractelement <16 x float> %a, i32 0		%a0 = extractelement <16 x float> %a, i32 0
%a1 = extractelement <16 x float> %a, i32 1		%a1 = extractelement <16 x float> %a, i32 1
%a2 = extractelement <16 x float> %a, i32 2		%a2 = extractelement <16 x float> %a, i32 2
%a3 = extractelement <16 x float> %a, i32 3		%a3 = extractelement <16 x float> %a, i32 3
%a4 = extractelement <16 x float> %a, i32 4		%a4 = extractelement <16 x float> %a, i32 4
%a5 = extractelement <16 x float> %a, i32 5		%a5 = extractelement <16 x float> %a, i32 5
%a6 = extractelement <16 x float> %a, i32 6		%a6 = extractelement <16 x float> %a, i32 6
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show All 16 Lines	;
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0		%ins1 = insertelement <2 x i8> poison, i8 %x0x0, i32 0
%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1		%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
ret <2 x i8> %ins2		ret <2 x i8> %ins2
}		}

define <4 x i8> @h(<4 x i8> %x, <4 x i8> %y) {		define <4 x i8> @h(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @h(		; CHECK-LABEL: @h(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 0, i32 3, i32 5, i32 6>
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: ret <4 x i8> [[TMP2]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]
; CHECK-NEXT: [[INS1:%.*]] = insertelement <4 x i8> poison, i8 [[X0X0]], i32 0
; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x i8> [[INS1]], i8 [[X3X3]], i32 1
; CHECK-NEXT: [[INS3:%.*]] = insertelement <4 x i8> [[INS2]], i8 [[Y1Y1]], i32 2
; CHECK-NEXT: [[INS4:%.*]] = insertelement <4 x i8> [[INS3]], i8 [[Y2Y2]], i32 3
; CHECK-NEXT: ret <4 x i8> [[INS4]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%ins1 = insertelement <4 x i8> poison, i8 %x0x0, i32 0		%ins1 = insertelement <4 x i8> poison, i8 %x0x0, i32 0
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define <4 x i8> @h_undef(<4 x i8> %x, <4 x i8> %y) {		define <4 x i8> @h_undef(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @h_undef(		; CHECK-LABEL: @h_undef(
; CHECK-NEXT: [[X3:%.]] = extractelement <4 x i8> [[X:%.]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 undef, i32 3, i32 5, i32 6>
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: ret <4 x i8> [[TMP2]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]
; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x i8> <i8 undef, i8 poison, i8 poison, i8 poison>, i8 [[X3X3]], i32 1
; CHECK-NEXT: [[INS3:%.*]] = insertelement <4 x i8> [[INS2]], i8 [[Y1Y1]], i32 2
; CHECK-NEXT: [[INS4:%.*]] = insertelement <4 x i8> [[INS3]], i8 [[Y2Y2]], i32 3
; CHECK-NEXT: ret <4 x i8> [[INS4]]
;		;
%x0 = extractelement <4 x i8> undef, i32 0		%x0 = extractelement <4 x i8> undef, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show All 16 Lines	;
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0		%ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1		%ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
ret <2 x i8> %ins2		ret <2 x i8> %ins2
}		}

define <4 x i8> @h(<4 x i8> %x, <4 x i8> %y) {		define <4 x i8> @h(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @h(		; CHECK-LABEL: @h(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 0, i32 3, i32 5, i32 6>
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: ret <4 x i8> [[TMP2]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]
; CHECK-NEXT: [[INS1:%.*]] = insertelement <4 x i8> undef, i8 [[X0X0]], i32 0
; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x i8> [[INS1]], i8 [[X3X3]], i32 1
; CHECK-NEXT: [[INS3:%.*]] = insertelement <4 x i8> [[INS2]], i8 [[Y1Y1]], i32 2
; CHECK-NEXT: [[INS4:%.*]] = insertelement <4 x i8> [[INS3]], i8 [[Y2Y2]], i32 3
; CHECK-NEXT: ret <4 x i8> [[INS4]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
%y2y2 = mul i8 %y2, %y2		%y2y2 = mul i8 %y2, %y2
%ins1 = insertelement <4 x i8> undef, i8 %x0x0, i32 0		%ins1 = insertelement <4 x i8> undef, i8 %x0x0, i32 0
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define <4 x i8> @h_undef(<4 x i8> %x, <4 x i8> %y) {		define <4 x i8> @h_undef(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @h_undef(		; CHECK-LABEL: @h_undef(
; CHECK-NEXT: [[X3:%.]] = extractelement <4 x i8> [[X:%.]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 undef, i32 3, i32 5, i32 6>
; CHECK-NEXT: [[Y1:%.]] = extractelement <4 x i8> [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[Y2:%.*]] = extractelement <4 x i8> [[Y]], i32 2		; CHECK-NEXT: ret <4 x i8> [[TMP2]]
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]
; CHECK-NEXT: [[Y1Y1:%.*]] = mul i8 [[Y1]], [[Y1]]
; CHECK-NEXT: [[Y2Y2:%.*]] = mul i8 [[Y2]], [[Y2]]
; CHECK-NEXT: [[INS2:%.*]] = insertelement <4 x i8> undef, i8 [[X3X3]], i32 1
; CHECK-NEXT: [[INS3:%.*]] = insertelement <4 x i8> [[INS2]], i8 [[Y1Y1]], i32 2
; CHECK-NEXT: [[INS4:%.*]] = insertelement <4 x i8> [[INS3]], i8 [[Y2Y2]], i32 3
; CHECK-NEXT: ret <4 x i8> [[INS4]]
;		;
%x0 = extractelement <4 x i8> undef, i32 0		%x0 = extractelement <4 x i8> undef, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%y1y1 = mul i8 %y1, %y1		%y1y1 = mul i8 %y1, %y1
▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s			; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s

	@array = external global [20 x [13 x i32]]			@array = external global [20 x [13 x i32]]

	define void @hoge(i64 %idx, <4 x i32>* %sink) {			define void @hoge(i64 %idx, <4 x i32>* %sink) {
	; CHECK-LABEL: @hoge(			; CHECK-LABEL: @hoge(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX:%.*]], i64 5			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX:%.*]], i64 5
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 6			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 6
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 7			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 7
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 8			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[TMP4]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[SINK:%.*]], align 16
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP10]], i32 2
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 3
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP12]], i32 3
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[SINK:%.*]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%0 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 5			%0 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 5
	%1 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 6			%1 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 6
	%2 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 7			%2 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 7
	%3 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 8			%3 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 8
	%4 = load i32, i32* %1, align 4			%4 = load i32, i32* %1, align 4
	Show All 11 Lines

llvm/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s			; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s

	@array = external global [20 x [13 x i32]]			@array = external global [20 x [13 x i32]]

	define void @hoge(i64 %idx, <4 x i32>* %sink) {			define void @hoge(i64 %idx, <4 x i32>* %sink) {
	; CHECK-LABEL: @hoge(			; CHECK-LABEL: @hoge(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX:%.*]], i64 5			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX:%.*]], i64 5
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 6			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 6
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 7			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 7
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 8			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]] @array, i64 0, i64 [[IDX]], i64 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[TMP4]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[SINK:%.*]], align 16
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP10]], i32 2
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 3
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP12]], i32 3
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[SINK:%.*]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%0 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 5			%0 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 5
	%1 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 6			%1 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 6
	%2 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 7			%2 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 7
	%3 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 8			%3 = getelementptr inbounds [20 x [13 x i32]], [20 x [13 x i32]]* @array, i64 0, i64 %idx, i64 8
	%4 = load i32, i32* %1, align 4			%4 = load i32, i32* %1, align 4
	Show All 11 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

	Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines
	}			}

	;			;
	; FPTOSI BUILDVECTOR			; FPTOSI BUILDVECTOR
	;			;

	define <4 x i32> @fptosi_4xf64_4i32(double %a0, double %a1, double %a2, double %a3) #0 {			define <4 x i32> @fptosi_4xf64_4i32(double %a0, double %a1, double %a2, double %a3) #0 {
	; CHECK-LABEL: @fptosi_4xf64_4i32(			; CHECK-LABEL: @fptosi_4xf64_4i32(
	; CHECK-NEXT: [[CVT0:%.]] = fptosi double [[A0:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[A0:%.]], i32 0
	; CHECK-NEXT: [[CVT1:%.]] = fptosi double [[A1:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[A1:%.]], i32 1
	; CHECK-NEXT: [[CVT2:%.]] = fptosi double [[A2:%.]] to i32			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[A2:%.]], i32 2
	; CHECK-NEXT: [[CVT3:%.]] = fptosi double [[A3:%.]] to i32			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[A3:%.]], i32 3
	; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x i32> poison, i32 [[CVT0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = fptosi <4 x double> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x i32> [[RES0]], i32 [[CVT1]], i32 1			; CHECK-NEXT: ret <4 x i32> [[TMP5]]
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x i32> [[RES1]], i32 [[CVT2]], i32 2
	; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x i32> [[RES2]], i32 [[CVT3]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[RES3]]
	;			;
	%cvt0 = fptosi double %a0 to i32			%cvt0 = fptosi double %a0 to i32
	%cvt1 = fptosi double %a1 to i32			%cvt1 = fptosi double %a1 to i32
	%cvt2 = fptosi double %a2 to i32			%cvt2 = fptosi double %a2 to i32
	%cvt3 = fptosi double %a3 to i32			%cvt3 = fptosi double %a3 to i32
	%res0 = insertelement <4 x i32> poison, i32 %cvt0, i32 0			%res0 = insertelement <4 x i32> poison, i32 %cvt0, i32 0
	%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1			%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1
	%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2			%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2
	%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3			%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3
	ret <4 x i32> %res3			ret <4 x i32> %res3
	}			}

	define <4 x i32> @fptosi_4xf32_4i32(float %a0, float %a1, float %a2, float %a3) #0 {			define <4 x i32> @fptosi_4xf32_4i32(float %a0, float %a1, float %a2, float %a3) #0 {
	; CHECK-LABEL: @fptosi_4xf32_4i32(			; CHECK-LABEL: @fptosi_4xf32_4i32(
	; CHECK-NEXT: [[CVT0:%.]] = fptosi float [[A0:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> poison, float [[A0:%.]], i32 0
	; CHECK-NEXT: [[CVT1:%.]] = fptosi float [[A1:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[A1:%.]], i32 1
	; CHECK-NEXT: [[CVT2:%.]] = fptosi float [[A2:%.]] to i32			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x float> [[TMP2]], float [[A2:%.]], i32 2
	; CHECK-NEXT: [[CVT3:%.]] = fptosi float [[A3:%.]] to i32			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x float> [[TMP3]], float [[A3:%.]], i32 3
	; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x i32> poison, i32 [[CVT0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = fptosi <4 x float> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x i32> [[RES0]], i32 [[CVT1]], i32 1			; CHECK-NEXT: ret <4 x i32> [[TMP5]]
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x i32> [[RES1]], i32 [[CVT2]], i32 2
	; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x i32> [[RES2]], i32 [[CVT3]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[RES3]]
	;			;
	%cvt0 = fptosi float %a0 to i32			%cvt0 = fptosi float %a0 to i32
	%cvt1 = fptosi float %a1 to i32			%cvt1 = fptosi float %a1 to i32
	%cvt2 = fptosi float %a2 to i32			%cvt2 = fptosi float %a2 to i32
	%cvt3 = fptosi float %a3 to i32			%cvt3 = fptosi float %a3 to i32
	%res0 = insertelement <4 x i32> poison, i32 %cvt0, i32 0			%res0 = insertelement <4 x i32> poison, i32 %cvt0, i32 0
	%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1			%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1
	%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2			%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2
	%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3			%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3
	ret <4 x i32> %res3			ret <4 x i32> %res3
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

	Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines
	}			}

	;			;
	; FPTOSI BUILDVECTOR			; FPTOSI BUILDVECTOR
	;			;

	define <4 x i32> @fptosi_4xf64_4i32(double %a0, double %a1, double %a2, double %a3) #0 {			define <4 x i32> @fptosi_4xf64_4i32(double %a0, double %a1, double %a2, double %a3) #0 {
	; CHECK-LABEL: @fptosi_4xf64_4i32(			; CHECK-LABEL: @fptosi_4xf64_4i32(
	; CHECK-NEXT: [[CVT0:%.]] = fptosi double [[A0:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[A0:%.]], i32 0
	; CHECK-NEXT: [[CVT1:%.]] = fptosi double [[A1:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[A1:%.]], i32 1
	; CHECK-NEXT: [[CVT2:%.]] = fptosi double [[A2:%.]] to i32			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[A2:%.]], i32 2
	; CHECK-NEXT: [[CVT3:%.]] = fptosi double [[A3:%.]] to i32			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[A3:%.]], i32 3
	; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x i32> undef, i32 [[CVT0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = fptosi <4 x double> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x i32> [[RES0]], i32 [[CVT1]], i32 1			; CHECK-NEXT: ret <4 x i32> [[TMP5]]
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x i32> [[RES1]], i32 [[CVT2]], i32 2
	; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x i32> [[RES2]], i32 [[CVT3]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[RES3]]
	;			;
	%cvt0 = fptosi double %a0 to i32			%cvt0 = fptosi double %a0 to i32
	%cvt1 = fptosi double %a1 to i32			%cvt1 = fptosi double %a1 to i32
	%cvt2 = fptosi double %a2 to i32			%cvt2 = fptosi double %a2 to i32
	%cvt3 = fptosi double %a3 to i32			%cvt3 = fptosi double %a3 to i32
	%res0 = insertelement <4 x i32> undef, i32 %cvt0, i32 0			%res0 = insertelement <4 x i32> undef, i32 %cvt0, i32 0
	%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1			%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1
	%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2			%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2
	%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3			%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3
	ret <4 x i32> %res3			ret <4 x i32> %res3
	}			}

	define <4 x i32> @fptosi_4xf32_4i32(float %a0, float %a1, float %a2, float %a3) #0 {			define <4 x i32> @fptosi_4xf32_4i32(float %a0, float %a1, float %a2, float %a3) #0 {
	; CHECK-LABEL: @fptosi_4xf32_4i32(			; CHECK-LABEL: @fptosi_4xf32_4i32(
	; CHECK-NEXT: [[CVT0:%.]] = fptosi float [[A0:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> poison, float [[A0:%.]], i32 0
	; CHECK-NEXT: [[CVT1:%.]] = fptosi float [[A1:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[A1:%.]], i32 1
	; CHECK-NEXT: [[CVT2:%.]] = fptosi float [[A2:%.]] to i32			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x float> [[TMP2]], float [[A2:%.]], i32 2
	; CHECK-NEXT: [[CVT3:%.]] = fptosi float [[A3:%.]] to i32			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x float> [[TMP3]], float [[A3:%.]], i32 3
	; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x i32> undef, i32 [[CVT0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = fptosi <4 x float> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x i32> [[RES0]], i32 [[CVT1]], i32 1			; CHECK-NEXT: ret <4 x i32> [[TMP5]]
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x i32> [[RES1]], i32 [[CVT2]], i32 2
	; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x i32> [[RES2]], i32 [[CVT3]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[RES3]]
	;			;
	%cvt0 = fptosi float %a0 to i32			%cvt0 = fptosi float %a0 to i32
	%cvt1 = fptosi float %a1 to i32			%cvt1 = fptosi float %a1 to i32
	%cvt2 = fptosi float %a2 to i32			%cvt2 = fptosi float %a2 to i32
	%cvt3 = fptosi float %a3 to i32			%cvt3 = fptosi float %a3 to i32
	%res0 = insertelement <4 x i32> undef, i32 %cvt0, i32 0			%res0 = insertelement <4 x i32> undef, i32 %cvt0, i32 0
	%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1			%res1 = insertelement <4 x i32> %res0, i32 %cvt1, i32 1
	%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2			%res2 = insertelement <4 x i32> %res1, i32 %cvt2, i32 2
	%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3			%res3 = insertelement <4 x i32> %res2, i32 %cvt3, i32 3
	ret <4 x i32> %res3			ret <4 x i32> %res3
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX

	;			;
	; 128-bit vectors			; 128-bit vectors
	;			;

	define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {
	; SSE-LABEL: @test_v2f64(			; CHECK-LABEL: @test_v2f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: ret <2 x double> [[TMP3]]			; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;
	; SLM-LABEL: @test_v2f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
	; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
	; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
	; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
	; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <2 x double> poison, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <2 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: ret <2 x double> [[R01]]
	;
	; AVX-LABEL: @test_v2f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%a0 = extractelement <2 x double> %a, i32 0			%a0 = extractelement <2 x double> %a, i32 0
	%a1 = extractelement <2 x double> %a, i32 1			%a1 = extractelement <2 x double> %a, i32 1
	%b0 = extractelement <2 x double> %b, i32 0			%b0 = extractelement <2 x double> %b, i32 0
	%b1 = extractelement <2 x double> %b, i32 1			%b1 = extractelement <2 x double> %b, i32 1
	%r0 = fadd double %a0, %a1			%r0 = fadd double %a0, %a1
	%r1 = fadd double %b0, %b1			%r1 = fadd double %b0, %b1
	%r00 = insertelement <2 x double> poison, double %r0, i32 0			%r00 = insertelement <2 x double> poison, double %r0, i32 0
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {			define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
	; SSE-LABEL: @test_v4f64(			; SSE-LABEL: @test_v4f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SSE-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SSE-NEXT: ret <4 x double> [[R03]]			; SSE-NEXT: ret <4 x double> [[R032]]
	;			;
	; SLM-LABEL: @test_v4f64(			; SLM-LABEL: @test_v4f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1			; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2			; SLM-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3			; SLM-NEXT: ret <4 x double> [[R032]]
	; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
	; SLM-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
	; SLM-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
	; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
	; SLM-NEXT: ret <4 x double> [[R03]]
	;			;
	; AVX-LABEL: @test_v4f64(			; AVX-LABEL: @test_v4f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <4 x double> [[TMP3]]			; AVX-NEXT: ret <4 x double> [[TMP3]]
	;			;
	%a0 = extractelement <4 x double> %a, i32 0			%a0 = extractelement <4 x double> %a, i32 0
	Show All 24 Lines
	;			;
	; SLM-LABEL: @test_v8f32(			; SLM-LABEL: @test_v8f32(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
	; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
	; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R072:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: ret <8 x float> [[R07]]			; SLM-NEXT: ret <8 x float> [[R072]]
	;			;
	; AVX-LABEL: @test_v8f32(			; AVX-LABEL: @test_v8f32(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <8 x float> [[TMP3]]			; AVX-NEXT: ret <8 x float> [[TMP3]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
	; SSE-LABEL: @test_v16i16(			; SSE-LABEL: @test_v16i16(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
	; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; SSE-NEXT: [[RV152:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; SSE-NEXT: ret <16 x i16> [[RV15]]			; SSE-NEXT: ret <16 x i16> [[RV152]]
	;			;
	; SLM-LABEL: @test_v16i16(			; SLM-LABEL: @test_v16i16(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
	; SLM-NEXT: ret <16 x i16> [[TMP3]]			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	; AVX-LABEL: @test_v16i16(			; AVX-LABEL: @test_v16i16(
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX

	;			;
	; 128-bit vectors			; 128-bit vectors
	;			;

	define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {
	; SSE-LABEL: @test_v2f64(			; CHECK-LABEL: @test_v2f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: ret <2 x double> [[TMP3]]			; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;
	; SLM-LABEL: @test_v2f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
	; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
	; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
	; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
	; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <2 x double> undef, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <2 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: ret <2 x double> [[R01]]
	;
	; AVX-LABEL: @test_v2f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%a0 = extractelement <2 x double> %a, i32 0			%a0 = extractelement <2 x double> %a, i32 0
	%a1 = extractelement <2 x double> %a, i32 1			%a1 = extractelement <2 x double> %a, i32 1
	%b0 = extractelement <2 x double> %b, i32 0			%b0 = extractelement <2 x double> %b, i32 0
	%b1 = extractelement <2 x double> %b, i32 1			%b1 = extractelement <2 x double> %b, i32 1
	%r0 = fadd double %a0, %a1			%r0 = fadd double %a0, %a1
	%r1 = fadd double %b0, %b1			%r1 = fadd double %b0, %b1
	%r00 = insertelement <2 x double> undef, double %r0, i32 0			%r00 = insertelement <2 x double> undef, double %r0, i32 0
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {			define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
	; SSE-LABEL: @test_v4f64(			; SSE-LABEL: @test_v4f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SSE-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SSE-NEXT: ret <4 x double> [[R03]]			; SSE-NEXT: ret <4 x double> [[R032]]
	;			;
	; SLM-LABEL: @test_v4f64(			; SLM-LABEL: @test_v4f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1			; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2			; SLM-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3			; SLM-NEXT: ret <4 x double> [[R032]]
	; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
	; SLM-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
	; SLM-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
	; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
	; SLM-NEXT: ret <4 x double> [[R03]]
	;			;
	; AVX-LABEL: @test_v4f64(			; AVX-LABEL: @test_v4f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <4 x double> [[TMP3]]			; AVX-NEXT: ret <4 x double> [[TMP3]]
	;			;
	%a0 = extractelement <4 x double> %a, i32 0			%a0 = extractelement <4 x double> %a, i32 0
	Show All 24 Lines
	;			;
	; SLM-LABEL: @test_v8f32(			; SLM-LABEL: @test_v8f32(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
	; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
	; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R072:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: ret <8 x float> [[R07]]			; SLM-NEXT: ret <8 x float> [[R072]]
	;			;
	; AVX-LABEL: @test_v8f32(			; AVX-LABEL: @test_v8f32(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
	; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <8 x float> [[TMP3]]			; AVX-NEXT: ret <8 x float> [[TMP3]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
	; SSE-LABEL: @test_v16i16(			; SSE-LABEL: @test_v16i16(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
	; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; SSE-NEXT: [[RV152:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; SSE-NEXT: ret <16 x i16> [[RV15]]			; SSE-NEXT: ret <16 x i16> [[RV152]]
	;			;
	; SLM-LABEL: @test_v16i16(			; SLM-LABEL: @test_v16i16(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
	; SLM-NEXT: ret <16 x i16> [[TMP3]]			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	; AVX-LABEL: @test_v16i16(			; AVX-LABEL: @test_v16i16(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <16 x i16> [[TMP3]]			; AVX-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	%a0 = extractelement <16 x i16> %a, i32 0			%a0 = extractelement <16 x i16> %a, i32 0
				RKSimonUnsubmitted Done Reply Inline Actions Regression - this is going to cause problems for HADD matching - ideally it'd use <8 x i16> RKSimon: Regression - this is going to cause problems for HADD matching - ideally it'd use <8 x i16>
	%a1 = extractelement <16 x i16> %a, i32 1			%a1 = extractelement <16 x i16> %a, i32 1
	%a2 = extractelement <16 x i16> %a, i32 2			%a2 = extractelement <16 x i16> %a, i32 2
	%a3 = extractelement <16 x i16> %a, i32 3			%a3 = extractelement <16 x i16> %a, i32 3
	%a4 = extractelement <16 x i16> %a, i32 4			%a4 = extractelement <16 x i16> %a, i32 4
	%a5 = extractelement <16 x i16> %a, i32 5			%a5 = extractelement <16 x i16> %a, i32 5
	%a6 = extractelement <16 x i16> %a, i32 6			%a6 = extractelement <16 x i16> %a, i32 6
	%a7 = extractelement <16 x i16> %a, i32 7			%a7 = extractelement <16 x i16> %a, i32 7
	%a8 = extractelement <16 x i16> %a, i32 8			%a8 = extractelement <16 x i16> %a, i32 8
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX

	;			;
	; 128-bit vectors			; 128-bit vectors
	;			;

	define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {
	; SSE-LABEL: @test_v2f64(			; CHECK-LABEL: @test_v2f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: ret <2 x double> [[TMP3]]			; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;
	; SLM-LABEL: @test_v2f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
	; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
	; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
	; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
	; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <2 x double> poison, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <2 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: ret <2 x double> [[R01]]
	;
	; AVX-LABEL: @test_v2f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%a0 = extractelement <2 x double> %a, i32 0			%a0 = extractelement <2 x double> %a, i32 0
	%a1 = extractelement <2 x double> %a, i32 1			%a1 = extractelement <2 x double> %a, i32 1
	%b0 = extractelement <2 x double> %b, i32 0			%b0 = extractelement <2 x double> %b, i32 0
	%b1 = extractelement <2 x double> %b, i32 1			%b1 = extractelement <2 x double> %b, i32 1
	%r0 = fsub double %a0, %a1			%r0 = fsub double %a0, %a1
	%r1 = fsub double %b0, %b1			%r1 = fsub double %b0, %b1
	%r00 = insertelement <2 x double> poison, double %r0, i32 0			%r00 = insertelement <2 x double> poison, double %r0, i32 0
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {			define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
	; SSE-LABEL: @test_v4f64(			; SSE-LABEL: @test_v4f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SSE-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SSE-NEXT: ret <4 x double> [[R03]]			; SSE-NEXT: ret <4 x double> [[R032]]
	;			;
	; SLM-LABEL: @test_v4f64(			; SLM-LABEL: @test_v4f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1			; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2			; SLM-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3			; SLM-NEXT: ret <4 x double> [[R032]]
	; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
	; SLM-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
	; SLM-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
	; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
	; SLM-NEXT: ret <4 x double> [[R03]]
	;			;
	; AVX-LABEL: @test_v4f64(			; AVX-LABEL: @test_v4f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <4 x double> [[TMP3]]			; AVX-NEXT: ret <4 x double> [[TMP3]]
	;			;
	%a0 = extractelement <4 x double> %a, i32 0			%a0 = extractelement <4 x double> %a, i32 0
	Show All 24 Lines
	;			;
	; SLM-LABEL: @test_v8f32(			; SLM-LABEL: @test_v8f32(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
	; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
	; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R072:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: ret <8 x float> [[R07]]			; SLM-NEXT: ret <8 x float> [[R072]]
	;			;
	; AVX-LABEL: @test_v8f32(			; AVX-LABEL: @test_v8f32(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <8 x float> [[TMP3]]			; AVX-NEXT: ret <8 x float> [[TMP3]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
	; SSE-LABEL: @test_v16i16(			; SSE-LABEL: @test_v16i16(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
	; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; SSE-NEXT: [[RV152:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; SSE-NEXT: ret <16 x i16> [[RV15]]			; SSE-NEXT: ret <16 x i16> [[RV152]]
	;			;
	; SLM-LABEL: @test_v16i16(			; SLM-LABEL: @test_v16i16(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
	; SLM-NEXT: ret <16 x i16> [[TMP3]]			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	; AVX-LABEL: @test_v16i16(			; AVX-LABEL: @test_v16i16(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <16 x i16> [[TMP3]]			; AVX-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	%a0 = extractelement <16 x i16> %a, i32 0			%a0 = extractelement <16 x i16> %a, i32 0
				RKSimonUnsubmitted Done Reply Inline Actions Regression RKSimon: Regression
	%a1 = extractelement <16 x i16> %a, i32 1			%a1 = extractelement <16 x i16> %a, i32 1
	%a2 = extractelement <16 x i16> %a, i32 2			%a2 = extractelement <16 x i16> %a, i32 2
	%a3 = extractelement <16 x i16> %a, i32 3			%a3 = extractelement <16 x i16> %a, i32 3
	%a4 = extractelement <16 x i16> %a, i32 4			%a4 = extractelement <16 x i16> %a, i32 4
	%a5 = extractelement <16 x i16> %a, i32 5			%a5 = extractelement <16 x i16> %a, i32 5
	%a6 = extractelement <16 x i16> %a, i32 6			%a6 = extractelement <16 x i16> %a, i32 6
	%a7 = extractelement <16 x i16> %a, i32 7			%a7 = extractelement <16 x i16> %a, i32 7
	%a8 = extractelement <16 x i16> %a, i32 8			%a8 = extractelement <16 x i16> %a, i32 8
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX

	;			;
	; 128-bit vectors			; 128-bit vectors
	;			;

	define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_v2f64(<2 x double> %a, <2 x double> %b) {
	; SSE-LABEL: @test_v2f64(			; CHECK-LABEL: @test_v2f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>			; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: ret <2 x double> [[TMP3]]			; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;
	; SLM-LABEL: @test_v2f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <2 x double> [[A:%.]], i32 0
	; SLM-NEXT: [[A1:%.*]] = extractelement <2 x double> [[A]], i32 1
	; SLM-NEXT: [[B0:%.]] = extractelement <2 x double> [[B:%.]], i32 0
	; SLM-NEXT: [[B1:%.*]] = extractelement <2 x double> [[B]], i32 1
	; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <2 x double> undef, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <2 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: ret <2 x double> [[R01]]
	;
	; AVX-LABEL: @test_v2f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> [[B:%.*]], <2 x i32> <i32 0, i32 2>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[A]], <2 x double> [[B]], <2 x i32> <i32 1, i32 3>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%a0 = extractelement <2 x double> %a, i32 0			%a0 = extractelement <2 x double> %a, i32 0
	%a1 = extractelement <2 x double> %a, i32 1			%a1 = extractelement <2 x double> %a, i32 1
	%b0 = extractelement <2 x double> %b, i32 0			%b0 = extractelement <2 x double> %b, i32 0
	%b1 = extractelement <2 x double> %b, i32 1			%b1 = extractelement <2 x double> %b, i32 1
	%r0 = fsub double %a0, %a1			%r0 = fsub double %a0, %a1
	%r1 = fsub double %b0, %b1			%r1 = fsub double %b0, %b1
	%r00 = insertelement <2 x double> undef, double %r0, i32 0			%r00 = insertelement <2 x double> undef, double %r0, i32 0
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {			define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
	; SSE-LABEL: @test_v4f64(			; SSE-LABEL: @test_v4f64(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SSE-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SSE-NEXT: ret <4 x double> [[R03]]			; SSE-NEXT: ret <4 x double> [[R032]]
	;			;
	; SLM-LABEL: @test_v4f64(			; SLM-LABEL: @test_v4f64(
	; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i32 0			; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
	; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i32 1			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
	; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i32 2			; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i32 3			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
	; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i32 0			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
	; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i32 1			; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i32 2			; SLM-NEXT: [[R032:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i32 3			; SLM-NEXT: ret <4 x double> [[R032]]
	; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
	; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
	; SLM-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
	; SLM-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
	; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i32 0
	; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i32 1
	; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i32 2
	; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i32 3
	; SLM-NEXT: ret <4 x double> [[R03]]
	;			;
	; AVX-LABEL: @test_v4f64(			; AVX-LABEL: @test_v4f64(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <4 x double> [[TMP3]]			; AVX-NEXT: ret <4 x double> [[TMP3]]
	;			;
	%a0 = extractelement <4 x double> %a, i32 0			%a0 = extractelement <4 x double> %a, i32 0
	Show All 24 Lines
	;			;
	; SLM-LABEL: @test_v8f32(			; SLM-LABEL: @test_v8f32(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
	; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
	; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>			; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>			; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
	; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]			; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
	; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[R072:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; SLM-NEXT: ret <8 x float> [[R07]]			; SLM-NEXT: ret <8 x float> [[R072]]
	;			;
	; AVX-LABEL: @test_v8f32(			; AVX-LABEL: @test_v8f32(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
	; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>			; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
	; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]			; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
	; AVX-NEXT: ret <8 x float> [[TMP3]]			; AVX-NEXT: ret <8 x float> [[TMP3]]
	;			;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {			define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
	; SSE-LABEL: @test_v16i16(			; SSE-LABEL: @test_v16i16(
	; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>			; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
	; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>			; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
	; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]			; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
	; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; SSE-NEXT: [[RV152:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; SSE-NEXT: ret <16 x i16> [[RV15]]			; SSE-NEXT: ret <16 x i16> [[RV152]]
	;			;
	; SLM-LABEL: @test_v16i16(			; SLM-LABEL: @test_v16i16(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>			; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
	; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>			; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
	; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
	; SLM-NEXT: ret <16 x i16> [[TMP3]]			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	;			;
	; AVX-LABEL: @test_v16i16(			; AVX-LABEL: @test_v16i16(
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -slp-vectorizer -slp-threshold=-10000 < %s \| FileCheck %s		; RUN: opt -S -slp-vectorizer -slp-threshold=-10000 < %s \| FileCheck %s
; RUN: opt -S -slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s		; RUN: opt -S -slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"		target triple = "x86_64-apple-macosx10.8.0"

define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select(		; CHECK-LABEL: @simple_select(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 2
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; doesn't matter		; doesn't matter
define <4 x float> @simple_select_insert_out_of_order(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_insert_out_of_order(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_insert_out_of_order(		; CHECK-LABEL: @simple_select_insert_out_of_order(
; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[SHUFFLE2:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE2:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <4 x i32> [[SHUFFLE]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <4 x i32> [[SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[SHUFFLE1]], <4 x float> [[SHUFFLE2]]		; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[SHUFFLE1]], <4 x float> [[SHUFFLE2]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 2		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 20 Lines
declare void @v4f32_user(<4 x float>) #0		declare void @v4f32_user(<4 x float>) #0
declare void @f32_user(float) #0		declare void @f32_user(float) #0

; Multiple users of the final constructed vector		; Multiple users of the final constructed vector
define <4 x float> @simple_select_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_users(		; CHECK-LABEL: @simple_select_users(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0		; CHECK-NEXT: call void @v4f32_user(<4 x float> [[TMP2]]) #[[ATTR0:[0-9]+]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 2
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: call void @v4f32_user(<4 x float> [[RD]]) [[ATTR0:#.*]]
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]		; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0		; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1		; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0		; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1		; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]		; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP11]], i32 0		; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> poison, float [[TMP17]], i32 0		; CHECK-NEXT: [[RB2:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP17]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP11]], i32 1		; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP18]], i32 1		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP16]], i32 0		; CHECK-NEXT: ret <4 x float> [[RD1]]
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> poison, float [[TMP19]], i32 2
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP20]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x i32> %rc, i32 %c3, i32 3		%rd = insertelement <4 x i32> %rc, i32 %c3, i32 3
ret <4 x i32> %rd		ret <4 x i32> %rd
}		}

define <2 x float> @simple_select_v2(<2 x float> %a, <2 x float> %b, <2 x i32> %c) #0 {		define <2 x float> @simple_select_v2(<2 x float> %a, <2 x float> %b, <2 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_v2(		; CHECK-LABEL: @simple_select_v2(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <2 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <2 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <2 x i1> [[TMP1]], <2 x float> [[A:%.]], <2 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <2 x i1> [[TMP1]], <2 x float> [[A:%.]], <2 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; CHECK-NEXT: ret <2 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <2 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: ret <2 x float> [[RB]]
;		;
%c0 = extractelement <2 x i32> %c, i32 0		%c0 = extractelement <2 x i32> %c, i32 0
%c1 = extractelement <2 x i32> %c, i32 1		%c1 = extractelement <2 x i32> %c, i32 1
%a0 = extractelement <2 x float> %a, i32 0		%a0 = extractelement <2 x float> %a, i32 0
%a1 = extractelement <2 x float> %a, i32 1		%a1 = extractelement <2 x float> %a, i32 1
%b0 = extractelement <2 x float> %b, i32 0		%b0 = extractelement <2 x float> %b, i32 0
%b1 = extractelement <2 x float> %b, i32 1		%b1 = extractelement <2 x float> %b, i32 1
%cmp0 = icmp ne i32 %c0, 0		%cmp0 = icmp ne i32 %c0, 0
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
ret <4 x float> %rb		ret <4 x float> %rb
}		}

; Make sure that vectorization happens even if insertelements operations		; Make sure that vectorization happens even if insertelements operations
; must be rescheduled. The case here is from compiling Julia.		; must be rescheduled. The case here is from compiling Julia.
define <4 x float> @reschedule_extract(<4 x float> %a, <4 x float> %b) {		define <4 x float> @reschedule_extract(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @reschedule_extract(		; CHECK-LABEL: @reschedule_extract(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> [[V0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[V3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%v0 = insertelement <4 x float> poison, float %c0, i32 0		%v0 = insertelement <4 x float> poison, float %c0, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
Show All 9 Lines	;
ret <4 x float> %v3		ret <4 x float> %v3
}		}

; Check that cost model for vectorization takes credit for		; Check that cost model for vectorization takes credit for
; instructions that are erased.		; instructions that are erased.
define <4 x float> @take_credit(<4 x float> %a, <4 x float> %b) {		define <4 x float> @take_credit(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @take_credit(		; CHECK-LABEL: @take_credit(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> [[V0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[V3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 13 Lines
define <4 x double> @multi_tree(double %w, double %x, double %y, double %z) {		define <4 x double> @multi_tree(double %w, double %x, double %y, double %z) {
; CHECK-LABEL: @multi_tree(		; CHECK-LABEL: @multi_tree(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[Z:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[Z:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[Y:%.]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[X:%.]], i32 2		; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[X:%.]], i32 2
; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[W:%.]], i32 3		; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[W:%.]], i32 3
; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 3.000000e+00, double 2.000000e+00, double 1.000000e+00, double 0.000000e+00>		; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 3.000000e+00, double 2.000000e+00, double 1.000000e+00, double 0.000000e+00>
; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP5]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>		; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP5]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x double> [[TMP6]], i32 3		; CHECK-NEXT: ret <4 x double> [[TMP6]]
; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x double> poison, double [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x double> [[TMP6]], i32 2
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x double> [[I1]], double [[TMP8]], i32 2
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x double> [[TMP6]], i32 1
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x double> [[I2]], double [[TMP9]], i32 1
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x double> [[TMP6]], i32 0
; CHECK-NEXT: [[I4:%.*]] = insertelement <4 x double> [[I3]], double [[TMP10]], i32 0
; CHECK-NEXT: ret <4 x double> [[I4]]
;		;
%t0 = fadd double %w , 0.000000e+00		%t0 = fadd double %w , 0.000000e+00
%t1 = fadd double %x , 1.000000e+00		%t1 = fadd double %x , 1.000000e+00
%t2 = fadd double %y , 2.000000e+00		%t2 = fadd double %y , 2.000000e+00
%t3 = fadd double %z , 3.000000e+00		%t3 = fadd double %z , 3.000000e+00
%t4 = fmul double %t0, 1.000000e+00		%t4 = fmul double %t0, 1.000000e+00
%i1 = insertelement <4 x double> poison, double %t4, i32 3		%i1 = insertelement <4 x double> poison, double %t4, i32 3
%t5 = fmul double %t1, 1.000000e+00		%t5 = fmul double %t1, 1.000000e+00
%i2 = insertelement <4 x double> %i1, double %t5, i32 2		%i2 = insertelement <4 x double> %i1, double %t5, i32 2
%t6 = fmul double %t2, 1.000000e+00		%t6 = fmul double %t2, 1.000000e+00
%i3 = insertelement <4 x double> %i2, double %t6, i32 1		%i3 = insertelement <4 x double> %i2, double %t6, i32 1
%t7 = fmul double %t3, 1.000000e+00		%t7 = fmul double %t3, 1.000000e+00
%i4 = insertelement <4 x double> %i3, double %t7, i32 0		%i4 = insertelement <4 x double> %i3, double %t7, i32 0
ret <4 x double> %i4		ret <4 x double> %i4
}		}

define <8 x float> @_vadd256(<8 x float> %a, <8 x float> %b) local_unnamed_addr #0 {		define <8 x float> @_vadd256(<8 x float> %a, <8 x float> %b) local_unnamed_addr #0 {
; CHECK-LABEL: @_vadd256(		; CHECK-LABEL: @_vadd256(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[VECINIT_I:%.*]] = insertelement <8 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINIT1_I:%.*]] = insertelement <8 x float> [[VECINIT_I]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINIT2_I:%.*]] = insertelement <8 x float> [[VECINIT1_I]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINIT3_I:%.*]] = insertelement <8 x float> [[VECINIT2_I]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[VECINIT4_I:%.*]] = insertelement <8 x float> [[VECINIT3_I]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[VECINIT5_I:%.*]] = insertelement <8 x float> [[VECINIT4_I]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[VECINIT6_I:%.*]] = insertelement <8 x float> [[VECINIT5_I]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[VECINIT7_I:%.*]] = insertelement <8 x float> [[VECINIT6_I]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[VECINIT7_I]]
;		;
%vecext = extractelement <8 x float> %a, i32 0		%vecext = extractelement <8 x float> %a, i32 0
%vecext1 = extractelement <8 x float> %b, i32 0		%vecext1 = extractelement <8 x float> %b, i32 0
%add = fadd float %vecext, %vecext1		%add = fadd float %vecext, %vecext1
%vecext2 = extractelement <8 x float> %a, i32 1		%vecext2 = extractelement <8 x float> %a, i32 1
%vecext3 = extractelement <8 x float> %b, i32 1		%vecext3 = extractelement <8 x float> %b, i32 1
%add4 = fadd float %vecext2, %vecext3		%add4 = fadd float %vecext2, %vecext3
%vecext5 = extractelement <8 x float> %a, i32 2		%vecext5 = extractelement <8 x float> %a, i32 2
Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -slp-vectorizer -slp-threshold=-10000 < %s \| FileCheck %s		; RUN: opt -S -slp-vectorizer -slp-threshold=-10000 < %s \| FileCheck %s
; RUN: opt -S -slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s		; RUN: opt -S -slp-vectorizer -slp-threshold=0 < %s \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"		target triple = "x86_64-apple-macosx10.8.0"

define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select(		; CHECK-LABEL: @simple_select(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 2
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; doesn't matter		; doesn't matter
define <4 x float> @simple_select_insert_out_of_order(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_insert_out_of_order(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_insert_out_of_order(		; CHECK-LABEL: @simple_select_insert_out_of_order(
; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[C:%.]], <4 x i32> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[A:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[SHUFFLE2:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>		; CHECK-NEXT: [[SHUFFLE2:%.]] = shufflevector <4 x float> [[B:%.]], <4 x float> poison, <4 x i32> <i32 2, i32 1, i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <4 x i32> [[SHUFFLE]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.*]] = icmp ne <4 x i32> [[SHUFFLE]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[SHUFFLE1]], <4 x float> [[SHUFFLE2]]		; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x float> [[SHUFFLE1]], <4 x float> [[SHUFFLE2]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 2		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 20 Lines
declare void @v4f32_user(<4 x float>) #0		declare void @v4f32_user(<4 x float>) #0
declare void @f32_user(float) #0		declare void @f32_user(float) #0

; Multiple users of the final constructed vector		; Multiple users of the final constructed vector
define <4 x float> @simple_select_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <4 x float> @simple_select_users(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_users(		; CHECK-LABEL: @simple_select_users(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0		; CHECK-NEXT: call void @v4f32_user(<4 x float> [[TMP2]]) #[[ATTR0:[0-9]+]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> [[RB]], float [[TMP5]], i32 2
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP6]], i32 3
; CHECK-NEXT: call void @v4f32_user(<4 x float> [[RD]]) [[ATTR0:#.*]]
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]		; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0		; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1		; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0		; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1		; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]		; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP11]], i32 0		; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP17]], i32 0		; CHECK-NEXT: [[RB2:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP17]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP11]], i32 1		; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP18]], i32 1		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP16]], i32 0		; CHECK-NEXT: ret <4 x float> [[RD1]]
; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> undef, float [[TMP19]], i32 2
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP20]], i32 3
; CHECK-NEXT: ret <4 x float> [[RD]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%rd = insertelement <4 x i32> %rc, i32 %c3, i32 3		%rd = insertelement <4 x i32> %rc, i32 %c3, i32 3
ret <4 x i32> %rd		ret <4 x i32> %rd
}		}

define <2 x float> @simple_select_v2(<2 x float> %a, <2 x float> %b, <2 x i32> %c) #0 {		define <2 x float> @simple_select_v2(<2 x float> %a, <2 x float> %b, <2 x i32> %c) #0 {
; CHECK-LABEL: @simple_select_v2(		; CHECK-LABEL: @simple_select_v2(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <2 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <2 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <2 x i1> [[TMP1]], <2 x float> [[A:%.]], <2 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <2 x i1> [[TMP1]], <2 x float> [[A:%.]], <2 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; CHECK-NEXT: ret <2 x float> [[TMP2]]
; CHECK-NEXT: [[RA:%.*]] = insertelement <2 x float> undef, float [[TMP3]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[RB:%.*]] = insertelement <2 x float> [[RA]], float [[TMP4]], i32 1
; CHECK-NEXT: ret <2 x float> [[RB]]
;		;
%c0 = extractelement <2 x i32> %c, i32 0		%c0 = extractelement <2 x i32> %c, i32 0
%c1 = extractelement <2 x i32> %c, i32 1		%c1 = extractelement <2 x i32> %c, i32 1
%a0 = extractelement <2 x float> %a, i32 0		%a0 = extractelement <2 x float> %a, i32 0
%a1 = extractelement <2 x float> %a, i32 1		%a1 = extractelement <2 x float> %a, i32 1
%b0 = extractelement <2 x float> %b, i32 0		%b0 = extractelement <2 x float> %b, i32 0
%b1 = extractelement <2 x float> %b, i32 1		%b1 = extractelement <2 x float> %b, i32 1
%cmp0 = icmp ne i32 %c0, 0		%cmp0 = icmp ne i32 %c0, 0
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
ret <4 x float> %rb		ret <4 x float> %rb
}		}

; Make sure that vectorization happens even if insertelements operations		; Make sure that vectorization happens even if insertelements operations
; must be rescheduled. The case here is from compiling Julia.		; must be rescheduled. The case here is from compiling Julia.
define <4 x float> @reschedule_extract(<4 x float> %a, <4 x float> %b) {		define <4 x float> @reschedule_extract(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @reschedule_extract(		; CHECK-LABEL: @reschedule_extract(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> [[V0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[V3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%v0 = insertelement <4 x float> undef, float %c0, i32 0		%v0 = insertelement <4 x float> undef, float %c0, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
Show All 9 Lines	;
ret <4 x float> %v3		ret <4 x float> %v3
}		}

; Check that cost model for vectorization takes credit for		; Check that cost model for vectorization takes credit for
; instructions that are erased.		; instructions that are erased.
define <4 x float> @take_credit(<4 x float> %a, <4 x float> %b) {		define <4 x float> @take_credit(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @take_credit(		; CHECK-LABEL: @take_credit(
; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <4 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[V0:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[V1:%.*]] = insertelement <4 x float> [[V0]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[TMP5]], i32 3
; CHECK-NEXT: ret <4 x float> [[V3]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%b0 = extractelement <4 x float> %b, i32 0		%b0 = extractelement <4 x float> %b, i32 0
%c0 = fadd float %a0, %b0		%c0 = fadd float %a0, %b0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%b1 = extractelement <4 x float> %b, i32 1		%b1 = extractelement <4 x float> %b, i32 1
%c1 = fadd float %a1, %b1		%c1 = fadd float %a1, %b1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
Show All 13 Lines
define <4 x double> @multi_tree(double %w, double %x, double %y, double %z) {		define <4 x double> @multi_tree(double %w, double %x, double %y, double %z) {
; CHECK-LABEL: @multi_tree(		; CHECK-LABEL: @multi_tree(
; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[Z:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x double> poison, double [[Z:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[Y:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x double> [[TMP1]], double [[Y:%.]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[X:%.]], i32 2		; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x double> [[TMP2]], double [[X:%.]], i32 2
; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[W:%.]], i32 3		; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x double> [[TMP3]], double [[W:%.]], i32 3
; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 3.000000e+00, double 2.000000e+00, double 1.000000e+00, double 0.000000e+00>		; CHECK-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP4]], <double 3.000000e+00, double 2.000000e+00, double 1.000000e+00, double 0.000000e+00>
; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP5]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>		; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x double> [[TMP5]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x double> [[TMP6]], i32 3		; CHECK-NEXT: ret <4 x double> [[TMP6]]
; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x double> undef, double [[TMP7]], i32 3
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x double> [[TMP6]], i32 2
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x double> [[I1]], double [[TMP8]], i32 2
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x double> [[TMP6]], i32 1
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x double> [[I2]], double [[TMP9]], i32 1
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x double> [[TMP6]], i32 0
; CHECK-NEXT: [[I4:%.*]] = insertelement <4 x double> [[I3]], double [[TMP10]], i32 0
; CHECK-NEXT: ret <4 x double> [[I4]]
;		;
%t0 = fadd double %w , 0.000000e+00		%t0 = fadd double %w , 0.000000e+00
%t1 = fadd double %x , 1.000000e+00		%t1 = fadd double %x , 1.000000e+00
%t2 = fadd double %y , 2.000000e+00		%t2 = fadd double %y , 2.000000e+00
%t3 = fadd double %z , 3.000000e+00		%t3 = fadd double %z , 3.000000e+00
%t4 = fmul double %t0, 1.000000e+00		%t4 = fmul double %t0, 1.000000e+00
%i1 = insertelement <4 x double> undef, double %t4, i32 3		%i1 = insertelement <4 x double> undef, double %t4, i32 3
%t5 = fmul double %t1, 1.000000e+00		%t5 = fmul double %t1, 1.000000e+00
%i2 = insertelement <4 x double> %i1, double %t5, i32 2		%i2 = insertelement <4 x double> %i1, double %t5, i32 2
%t6 = fmul double %t2, 1.000000e+00		%t6 = fmul double %t2, 1.000000e+00
%i3 = insertelement <4 x double> %i2, double %t6, i32 1		%i3 = insertelement <4 x double> %i2, double %t6, i32 1
%t7 = fmul double %t3, 1.000000e+00		%t7 = fmul double %t3, 1.000000e+00
%i4 = insertelement <4 x double> %i3, double %t7, i32 0		%i4 = insertelement <4 x double> %i3, double %t7, i32 0
ret <4 x double> %i4		ret <4 x double> %i4
}		}

define <8 x float> @_vadd256(<8 x float> %a, <8 x float> %b) local_unnamed_addr #0 {		define <8 x float> @_vadd256(<8 x float> %a, <8 x float> %b) local_unnamed_addr #0 {
; CHECK-LABEL: @_vadd256(		; CHECK-LABEL: @_vadd256(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[TMP1]], i32 0		; CHECK-NEXT: ret <8 x float> [[TMP1]]
; CHECK-NEXT: [[VECINIT_I:%.*]] = insertelement <8 x float> undef, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[TMP1]], i32 1
; CHECK-NEXT: [[VECINIT1_I:%.*]] = insertelement <8 x float> [[VECINIT_I]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[TMP1]], i32 2
; CHECK-NEXT: [[VECINIT2_I:%.*]] = insertelement <8 x float> [[VECINIT1_I]], float [[TMP4]], i32 2
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x float> [[TMP1]], i32 3
; CHECK-NEXT: [[VECINIT3_I:%.*]] = insertelement <8 x float> [[VECINIT2_I]], float [[TMP5]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <8 x float> [[TMP1]], i32 4
; CHECK-NEXT: [[VECINIT4_I:%.*]] = insertelement <8 x float> [[VECINIT3_I]], float [[TMP6]], i32 4
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x float> [[TMP1]], i32 5
; CHECK-NEXT: [[VECINIT5_I:%.*]] = insertelement <8 x float> [[VECINIT4_I]], float [[TMP7]], i32 5
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[TMP1]], i32 6
; CHECK-NEXT: [[VECINIT6_I:%.*]] = insertelement <8 x float> [[VECINIT5_I]], float [[TMP8]], i32 6
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[TMP1]], i32 7
; CHECK-NEXT: [[VECINIT7_I:%.*]] = insertelement <8 x float> [[VECINIT6_I]], float [[TMP9]], i32 7
; CHECK-NEXT: ret <8 x float> [[VECINIT7_I]]
;		;
%vecext = extractelement <8 x float> %a, i32 0		%vecext = extractelement <8 x float> %a, i32 0
%vecext1 = extractelement <8 x float> %b, i32 0		%vecext1 = extractelement <8 x float> %b, i32 0
%add = fadd float %vecext, %vecext1		%add = fadd float %vecext, %vecext1
%vecext2 = extractelement <8 x float> %a, i32 1		%vecext2 = extractelement <8 x float> %a, i32 1
%vecext3 = extractelement <8 x float> %b, i32 1		%vecext3 = extractelement <8 x float> %b, i32 1
%add4 = fadd float %vecext2, %vecext3		%add4 = fadd float %vecext2, %vecext3
%vecext5 = extractelement <8 x float> %a, i32 2		%vecext5 = extractelement <8 x float> %a, i32 2
Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byref(			; CHECK-LABEL: @PR16739_byref(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0			; CHECK-NEXT: [[I11:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I11]], float [[X2]], i32 2
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[X2]], i32 2
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3			; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
	; CHECK-NEXT: ret <4 x float> [[I3]]			; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%x2 = load float, float* %gep2			%x2 = load float, float* %gep2
	%i0 = insertelement <4 x float> poison, float %x0, i32 0			%i0 = insertelement <4 x float> poison, float %x0, i32 0
	%i1 = insertelement <4 x float> %i0, float %x1, i32 1			%i1 = insertelement <4 x float> %i0, float %x1, i32 1
	%i2 = insertelement <4 x float> %i1, float %x2, i32 2			%i2 = insertelement <4 x float> %i1, float %x2, i32 2
	%i3 = insertelement <4 x float> %i2, float %x2, i32 3			%i3 = insertelement <4 x float> %i2, float %x2, i32 3
	ret <4 x float> %i3			ret <4 x float> %i3
	}			}

	define <4 x float> @PR16739_byref_alt(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byref_alt(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byref_alt(			; CHECK-LABEL: @PR16739_byref_alt(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i32 0
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 2
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[TMP4]], i32 2
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%i0 = insertelement <4 x float> poison, float %x0, i32 0			%i0 = insertelement <4 x float> poison, float %x0, i32 0
	%i1 = insertelement <4 x float> %i0, float %x0, i32 1			%i1 = insertelement <4 x float> %i0, float %x0, i32 1
	%i2 = insertelement <4 x float> %i1, float %x1, i32 2			%i2 = insertelement <4 x float> %i1, float %x1, i32 2
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byref(			; CHECK-LABEL: @PR16739_byref(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4			; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0			; CHECK-NEXT: [[I11:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP3]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I11]], float [[X2]], i32 2
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP4]], i32 1
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[X2]], i32 2
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3			; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
	; CHECK-NEXT: ret <4 x float> [[I3]]			; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%x2 = load float, float* %gep2			%x2 = load float, float* %gep2
	%i0 = insertelement <4 x float> undef, float %x0, i32 0			%i0 = insertelement <4 x float> undef, float %x0, i32 0
	%i1 = insertelement <4 x float> %i0, float %x1, i32 1			%i1 = insertelement <4 x float> %i0, float %x1, i32 1
	%i2 = insertelement <4 x float> %i1, float %x2, i32 2			%i2 = insertelement <4 x float> %i1, float %x2, i32 2
	%i3 = insertelement <4 x float> %i2, float %x2, i32 3			%i3 = insertelement <4 x float> %i2, float %x2, i32 3
	ret <4 x float> %i3			ret <4 x float> %i3
	}			}

	define <4 x float> @PR16739_byref_alt(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byref_alt(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byref_alt(			; CHECK-LABEL: @PR16739_byref_alt(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]
	; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x float> [[I0]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 2
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[I1]], float [[TMP4]], i32 2
	; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x float> [[I3]]
	;			;
	%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0			%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
	%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1			%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
	%x0 = load float, float* %gep0			%x0 = load float, float* %gep0
	%x1 = load float, float* %gep1			%x1 = load float, float* %gep1
	%i0 = insertelement <4 x float> undef, float %x0, i32 0			%i0 = insertelement <4 x float> undef, float %x0, i32 0
	%i1 = insertelement <4 x float> %i0, float %x0, i32 1			%i1 = insertelement <4 x float> %i0, float %x0, i32 1
	%i2 = insertelement <4 x float> %i1, float %x1, i32 2			%i2 = insertelement <4 x float> %i1, float %x1, i32 2
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/long_chains.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; At this point we can't vectorize only parts of the tree.			; At this point we can't vectorize only parts of the tree.

	define i32 @test(double* nocapture %A, i8* nocapture %B) {			define i32 @test(double* nocapture %A, i8* nocapture %B) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[B:%.]] to <2 x i8>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[B:%.]] to <2 x i8>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i8> [[TMP1]], <i8 3, i8 3>			; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i8> [[TMP1]], <i8 3, i8 3>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = sitofp <2 x i8> [[TMP2]] to <2 x double>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i8> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i8> poison, i8 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i8> [[TMP5]], i8 [[TMP3]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[TMP6]] to <2 x double>			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x double> [[TMP13]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP15:%.*]] = fadd <2 x double> [[TMP14]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: [[TMP16:%.*]] = fmul <2 x double> [[TMP15]], [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <2 x double> [[TMP16]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP17]], <2 x double>* [[TMP18]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i8, i8* %B, align 1			%0 = load i8, i8* %B, align 1
	%arrayidx1 = getelementptr inbounds i8, i8* %B, i64 1			%arrayidx1 = getelementptr inbounds i8, i8* %B, i64 1
	%1 = load i8, i8* %arrayidx1, align 1			%1 = load i8, i8* %arrayidx1, align 1
	%add = add i8 %0, 3			%add = add i8 %0, 3
	%add4 = add i8 %1, 3			%add4 = add i8 %1, 3
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

	Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
	define void @vecload_vs_broadcast5(double * noalias %from, double * noalias %to, double %v1, double %v2) {			define void @vecload_vs_broadcast5(double * noalias %from, double * noalias %to, double %v1, double %v2) {
	; CHECK-LABEL: @vecload_vs_broadcast5(			; CHECK-LABEL: @vecload_vs_broadcast5(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LP:%.*]]			; CHECK-NEXT: br label [[LP:%.*]]
	; CHECK: lp:			; CHECK: lp:
	; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
	; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]			; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
	; CHECK: ext:			; CHECK: ext:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %lp			br label %lp

	lp:			lp:
	▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX1]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 3
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY]] ], [ [[TMP17:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP13:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP16:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[SHUFFLE]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[SHUFFLE]], [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP10:%.]] = load <2 x float>, <2 x float> [[TMP9]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load <2 x float>, <2 x float> [[TMP9]], align 4
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP11]] = extractelement <2 x float> [[SHUFFLE1]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x float> [[SHUFFLE1]], <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x float> poison, float [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP11]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP13]] = extractelement <2 x float> [[SHUFFLE1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x float> [[TMP12]], float [[TMP8]], i32 2
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP12]], float [[TMP13]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP4]], i32 3
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP8]], i32 2			; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[TMP14]], <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP4]], i32 3			; CHECK-NEXT: [[TMP16]] = extractelement <2 x float> [[SHUFFLE1]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = fmul <4 x float> [[TMP16]], <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float 8.000000e+00>			; CHECK-NEXT: [[TMP17]] = extractelement <2 x float> [[SHUFFLE1]], i32 0
	; CHECK-NEXT: [[TMP18]] = fadd <4 x float> [[TMP6]], [[TMP17]]			; CHECK-NEXT: [[TMP18]] = fadd <4 x float> [[TMP6]], [[TMP15]]
	; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP18]], i32 3			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x float> [[TMP18]], i32 3
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP20]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP20]]
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP18]], i32 2			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP18]], i32 2
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP21]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP21]]
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr31599-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define <2 x float> @foo() {			define <2 x float> @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SOURCE:%.*]] = insertelement <2 x float> poison, float undef, i32 0			; CHECK-NEXT: [[SOURCE:%.*]] = insertelement <2 x float> poison, float undef, i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = fsub <2 x float> [[SOURCE]], [[SOURCE]]			; CHECK-NEXT: [[TMP0:%.*]] = fsub <2 x float> [[SOURCE]], [[SOURCE]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; CHECK-NEXT: ret <2 x float> [[TMP0]]
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <2 x float> [[RES1]], float [[TMP2]], i32 1
	; CHECK-NEXT: ret <2 x float> [[RES2]]
	;			;
	entry:			entry:
	%source = insertelement <2 x float> poison, float undef, i32 0			%source = insertelement <2 x float> poison, float undef, i32 0
	%e0 = extractelement <2 x float> %source, i32 0			%e0 = extractelement <2 x float> %source, i32 0
	%e0.dup = extractelement <2 x float> %source, i32 0			%e0.dup = extractelement <2 x float> %source, i32 0
	%sub1 = fsub float %e0, %e0.dup			%sub1 = fsub float %e0, %e0.dup
	%e1 = extractelement <2 x float> %source, i32 1			%e1 = extractelement <2 x float> %source, i32 1
	%e1.dup = extractelement <2 x float> %source, i32 1			%e1.dup = extractelement <2 x float> %source, i32 1
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr31599.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define <2 x float> @foo() {			define <2 x float> @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SOURCE:%.*]] = insertelement <2 x float> undef, float undef, i32 0			; CHECK-NEXT: [[SOURCE:%.*]] = insertelement <2 x float> undef, float undef, i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = fsub <2 x float> [[SOURCE]], [[SOURCE]]			; CHECK-NEXT: [[TMP0:%.*]] = fsub <2 x float> [[SOURCE]], [[SOURCE]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; CHECK-NEXT: ret <2 x float> [[TMP0]]
	; CHECK-NEXT: [[RES1:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[RES2:%.*]] = insertelement <2 x float> [[RES1]], float [[TMP2]], i32 1
	; CHECK-NEXT: ret <2 x float> [[RES2]]
	;			;
	entry:			entry:
	%source = insertelement <2 x float> undef, float undef, i32 0			%source = insertelement <2 x float> undef, float undef, i32 0
	%e0 = extractelement <2 x float> %source, i32 0			%e0 = extractelement <2 x float> %source, i32 0
	%e0.dup = extractelement <2 x float> %source, i32 0			%e0.dup = extractelement <2 x float> %source, i32 0
	%sub1 = fsub float %e0, %e0.dup			%sub1 = fsub float %e0, %e0.dup
	%e1 = extractelement <2 x float> %source, i32 1			%e1 = extractelement <2 x float> %source, i32 1
	%e1.dup = extractelement <2 x float> %source, i32 1			%e1.dup = extractelement <2 x float> %source, i32 1
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr40522.ll

Show All 25 Lines	entry:
store i32 %conv3, i32* %incdec.ptr8, align 4, !tbaa !2		store i32 %conv3, i32* %incdec.ptr8, align 4, !tbaa !2
store i32 %conv5, i32* %incdec.ptr10, align 4, !tbaa !2		store i32 %conv5, i32* %incdec.ptr10, align 4, !tbaa !2
ret void		ret void
}		}

define void @test1_vec(float %a, float %b, float %c, float %d, <4 x i32>* nocapture %p) {		define void @test1_vec(float %a, float %b, float %c, float %d, <4 x i32>* nocapture %p) {
; CHECK-LABEL: @test1_vec(		; CHECK-LABEL: @test1_vec(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = fptosi float [[A:%.]] to i32		; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x float> poison, float [[A:%.]], i32 0
; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i32> undef, i32 [[CONV]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x float> [[TMP0]], float [[B:%.]], i32 1
; CHECK-NEXT: [[CONV1:%.]] = fptosi float [[B:%.]] to i32		; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x float> [[TMP1]], float [[C:%.]], i32 2
; CHECK-NEXT: [[VECINIT2:%.*]] = insertelement <4 x i32> [[VECINIT]], i32 [[CONV1]], i32 1		; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x float> [[TMP2]], float [[D:%.]], i32 3
; CHECK-NEXT: [[CONV3:%.]] = fptosi float [[C:%.]] to i32		; CHECK-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP3]] to <4 x i32>
; CHECK-NEXT: [[VECINIT4:%.*]] = insertelement <4 x i32> [[VECINIT2]], i32 [[CONV3]], i32 2		; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[P:%.*]], align 16, !tbaa [[TBAA0]]
; CHECK-NEXT: [[CONV5:%.]] = fptosi float [[D:%.]] to i32
; CHECK-NEXT: [[VECINIT6:%.*]] = insertelement <4 x i32> [[VECINIT4]], i32 [[CONV5]], i32 3
; CHECK-NEXT: store <4 x i32> [[VECINIT6]], <4 x i32>* [[P:%.*]], align 16, !tbaa [[TBAA0]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%conv = fptosi float %a to i32		%conv = fptosi float %a to i32
%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0		%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
%conv1 = fptosi float %b to i32		%conv1 = fptosi float %b to i32
%vecinit2 = insertelement <4 x i32> %vecinit, i32 %conv1, i32 1		%vecinit2 = insertelement <4 x i32> %vecinit, i32 %conv1, i32 1
%conv3 = fptosi float %c to i32		%conv3 = fptosi float %c to i32
Show All 28 Lines	entry:
%incdec.ptr10 = getelementptr inbounds i32, i32* %p, i64 3		%incdec.ptr10 = getelementptr inbounds i32, i32* %p, i64 3
store i32 %add3, i32* %incdec.ptr8, align 4, !tbaa !2		store i32 %add3, i32* %incdec.ptr8, align 4, !tbaa !2
store i32 %add5, i32* %incdec.ptr10, align 4, !tbaa !2		store i32 %add5, i32* %incdec.ptr10, align 4, !tbaa !2
ret void		ret void
}		}

define void @test2_vec(i32 %0, i32 %1, i32 %2, i32 %3, <4 x i32>* nocapture %4) {		define void @test2_vec(i32 %0, i32 %1, i32 %2, i32 %3, <4 x i32>* nocapture %4) {
; CHECK-LABEL: @test2_vec(		; CHECK-LABEL: @test2_vec(
; CHECK-NEXT: [[TMP6:%.]] = add nsw i32 [[TMP0:%.]], 1		; CHECK-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32 [[TMP0:%.]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0		; CHECK-NEXT: [[TMP7:%.]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP1:%.]], i32 1
; CHECK-NEXT: [[TMP8:%.]] = add nsw i32 [[TMP1:%.]], 1		; CHECK-NEXT: [[TMP8:%.]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP2:%.]], i32 2
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP8]], i32 1		; CHECK-NEXT: [[TMP9:%.]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP3:%.]], i32 3
; CHECK-NEXT: [[TMP10:%.]] = add nsw i32 [[TMP2:%.]], 1		; CHECK-NEXT: [[TMP10:%.*]] = add nsw <4 x i32> [[TMP9]], <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[TMP10]], i32 2		; CHECK-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP4:%.*]], align 16, !tbaa [[TBAA0]]
; CHECK-NEXT: [[TMP12:%.]] = add nsw i32 [[TMP3:%.]], 1
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP12]], i32 3
; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP4:%.*]], align 16, !tbaa [[TBAA0]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%6 = add nsw i32 %0, 1		%6 = add nsw i32 %0, 1
%7 = insertelement <4 x i32> undef, i32 %6, i32 0		%7 = insertelement <4 x i32> undef, i32 %6, i32 0
%8 = add nsw i32 %1, 1		%8 = add nsw i32 %1, 1
%9 = insertelement <4 x i32> %7, i32 %8, i32 1		%9 = insertelement <4 x i32> %7, i32 %8, i32 1
%10 = add nsw i32 %2, 1		%10 = add nsw i32 %2, 1
%11 = insertelement <4 x i32> %9, i32 %10, i32 2		%11 = insertelement <4 x i32> %9, i32 %10, i32 2
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr44067-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D83779			; See https://reviews.llvm.org/D83779

	define <2 x float> @foo({{float, float}}* %A) {			define <2 x float> @foo({{float, float}}* %A) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast { { float, float } } [[A:%.]] to <2 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast { { float, float } } [[A:%.]] to <2 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 2.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 2.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: ret <2 x float> [[TMP2]]
	; CHECK-NEXT: [[INS1:%.*]] = insertelement <2 x float> poison, float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[INS0:%.*]] = insertelement <2 x float> [[INS1]], float [[TMP4]], i32 0
	; CHECK-NEXT: ret <2 x float> [[INS0]]
	;			;
	entry:			entry:
	%0 = bitcast {{float, float}}* %A to <2 x float>*			%0 = bitcast {{float, float}}* %A to <2 x float>*
	%1 = load <2 x float>, <2 x float>* %0			%1 = load <2 x float>, <2 x float>* %0
	%L0 = extractelement <2 x float> %1, i32 0			%L0 = extractelement <2 x float> %1, i32 0
	%L1 = extractelement <2 x float> %1, i32 1			%L1 = extractelement <2 x float> %1, i32 1
	%Mul0 = fmul float %L0, 2.000000e+00			%Mul0 = fmul float %L0, 2.000000e+00
	%Mul1 = fmul float %L1, 2.000000e+00			%Mul1 = fmul float %L1, 2.000000e+00
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr44067.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	; See https://reviews.llvm.org/D83779			; See https://reviews.llvm.org/D83779

	define <2 x float> @foo({{float, float}}* %A) {			define <2 x float> @foo({{float, float}}* %A) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast { { float, float } } [[A:%.]] to <2 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast { { float, float } } [[A:%.]] to <2 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 2.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 2.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 1			; CHECK-NEXT: ret <2 x float> [[TMP2]]
	; CHECK-NEXT: [[INS1:%.*]] = insertelement <2 x float> undef, float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[INS0:%.*]] = insertelement <2 x float> [[INS1]], float [[TMP4]], i32 0
	; CHECK-NEXT: ret <2 x float> [[INS0]]
	;			;
	entry:			entry:
	%0 = bitcast {{float, float}}* %A to <2 x float>*			%0 = bitcast {{float, float}}* %A to <2 x float>*
	%1 = load <2 x float>, <2 x float>* %0			%1 = load <2 x float>, <2 x float>* %0
	%L0 = extractelement <2 x float> %1, i32 0			%L0 = extractelement <2 x float> %1, i32 0
	%L1 = extractelement <2 x float> %1, i32 1			%L1 = extractelement <2 x float> %1, i32 1
	%Mul0 = fmul float %L0, 2.000000e+00			%Mul0 = fmul float %L0, 2.000000e+00
	%Mul1 = fmul float %L1, 2.000000e+00			%Mul1 = fmul float %L1, 2.000000e+00
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

	Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]			; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]
	; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*			; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_div(			; AVX-LABEL: @gather_load_div(
	; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 17, i64 8>
	; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 5, i64 20>
	; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> [[TMP9]], float* [[TMP3]], i32 1
	; AVX-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX-NEXT: [[TMP11:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX-NEXT: [[TMP15:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX-NEXT: [[TMP16:%.]] = shufflevector <8 x float> [[TMP14]], <8 x float*> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP18:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX-NEXT: [[TMP19:%.]] = getelementptr float, <8 x float> [[TMP18]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX-NEXT: [[TMP20:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP19]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP21:%.*]] = fdiv <8 x float> [[TMP17]], [[TMP20]]
	; AVX-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX-NEXT: [[TMP22:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX-NEXT: store <8 x float> [[TMP21]], <8 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_div(			; AVX2-LABEL: @gather_load_div(
	; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX2-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX2-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX2-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX2-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX2-NEXT: [[TMP8:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> undef, <4 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX2-NEXT: [[TMP9:%.]] = getelementptr float, <4 x float> [[TMP8]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>
	; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1
	; AVX2-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX2-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX2-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX2-NEXT: [[TMP14:%.]] = shufflevector <4 x float> [[TMP9]], <4 x float*> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX2-NEXT: [[TMP15:%.]] = shufflevector <8 x float> [[TMP13]], <8 x float*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX2-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX2-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX2-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX2-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX2-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX2-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX2-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX2-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_div(			; AVX512-LABEL: @gather_load_div(
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX512-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX512-NEXT: [[TMP8:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> undef, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512-NEXT: [[TMP9:%.]] = getelementptr float, <4 x float> [[TMP8]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>
	; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1
	; AVX512-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX512-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX512-NEXT: [[TMP14:%.]] = shufflevector <4 x float> [[TMP9]], <4 x float*> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX512-NEXT: [[TMP15:%.]] = shufflevector <8 x float> [[TMP13]], <8 x float*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX512-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX512-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX512-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX512-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%3 = load float, float* %1, align 4, !tbaa !2			%3 = load float, float* %1, align 4, !tbaa !2
	%4 = getelementptr inbounds float, float* %1, i64 4			%4 = getelementptr inbounds float, float* %1, i64 4
	%5 = load float, float* %4, align 4, !tbaa !2			%5 = load float, float* %4, align 4, !tbaa !2
	%6 = fdiv float %3, %5			%6 = fdiv float %3, %5
	%7 = getelementptr inbounds float, float* %0, i64 1			%7 = getelementptr inbounds float, float* %0, i64 1
	store float %6, float* %0, align 4, !tbaa !2			store float %6, float* %0, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

	Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), !tbaa [[TBAA0]]			; SSE-NEXT: [[TMP20:%.]] = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float> [[TMP19]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef), !tbaa [[TBAA0]]
	; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]			; SSE-NEXT: [[TMP21:%.*]] = fdiv <4 x float> [[TMP18]], [[TMP20]]
	; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*			; SSE-NEXT: [[TMP22:%.]] = bitcast float [[TMP15]] to <4 x float>*
	; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store <4 x float> [[TMP21]], <4 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_div(			; AVX-LABEL: @gather_load_div(
	; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 17, i64 8>
	; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX-NEXT: [[TMP8:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 5, i64 20>
	; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX-NEXT: [[TMP10:%.]] = insertelement <8 x float> [[TMP9]], float* [[TMP3]], i32 1
	; AVX-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX-NEXT: [[TMP11:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX-NEXT: [[TMP15:%.]] = shufflevector <2 x float> [[TMP8]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX-NEXT: [[TMP16:%.]] = shufflevector <8 x float> [[TMP14]], <8 x float*> [[TMP15]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP18:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX-NEXT: [[TMP19:%.]] = getelementptr float, <8 x float> [[TMP18]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX-NEXT: [[TMP20:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP19]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP21:%.*]] = fdiv <8 x float> [[TMP17]], [[TMP20]]
	; AVX-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX-NEXT: [[TMP22:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX-NEXT: store <8 x float> [[TMP21]], <8 x float>* [[TMP22]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_div(			; AVX2-LABEL: @gather_load_div(
	; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX2-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX2-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX2-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX2-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX2-NEXT: [[TMP8:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> undef, <4 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX2-NEXT: [[TMP9:%.]] = getelementptr float, <4 x float> [[TMP8]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>
	; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX2-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX2-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1
	; AVX2-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX2-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX2-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX2-NEXT: [[TMP14:%.]] = shufflevector <4 x float> [[TMP9]], <4 x float*> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX2-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX2-NEXT: [[TMP15:%.]] = shufflevector <8 x float> [[TMP13]], <8 x float*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX2-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX2-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX2-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX2-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX2-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX2-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX2-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX2-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX2-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_div(			; AVX512-LABEL: @gather_load_div(
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> undef, <2 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17			; AVX512-NEXT: [[TMP6:%.]] = getelementptr float, <2 x float> [[TMP5]], <2 x i64> <i64 3, i64 14>
	; AVX512-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8			; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5			; AVX512-NEXT: [[TMP8:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> undef, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512-NEXT: [[TMP9:%.]] = getelementptr float, <4 x float> [[TMP8]], <4 x i64> <i64 17, i64 8, i64 5, i64 20>
	; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1			; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x float> [[TMP10]], float* [[TMP3]], i32 1
	; AVX512-NEXT: [[TMP12:%.]] = insertelement <8 x float> [[TMP11]], float* [[TMP4]], i32 2			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP6]], <2 x float*> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP13:%.]] = insertelement <8 x float> [[TMP12]], float* [[TMP5]], i32 3			; AVX512-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP6]], i32 4			; AVX512-NEXT: [[TMP14:%.]] = shufflevector <4 x float> [[TMP9]], <4 x float*> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP7]], i32 5			; AVX512-NEXT: [[TMP15:%.]] = shufflevector <8 x float> [[TMP13]], <8 x float*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[TMP16:%.]] = insertelement <8 x float> [[TMP15]], float* [[TMP8]], i32 6			; AVX512-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP17:%.]] = insertelement <8 x float> [[TMP16]], float* [[TMP9]], i32 7			; AVX512-NEXT: [[TMP17:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP18:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP17]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP18:%.]] = getelementptr float, <8 x float> [[TMP17]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512-NEXT: [[TMP19:%.]] = shufflevector <8 x float> [[TMP10]], <8 x float*> undef, <8 x i32> zeroinitializer			; AVX512-NEXT: [[TMP19:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP18]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr float, <8 x float> [[TMP19]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512-NEXT: [[TMP20:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP19]]
	; AVX512-NEXT: [[TMP21:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP20]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP21:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512-NEXT: [[TMP22:%.*]] = fdiv <8 x float> [[TMP18]], [[TMP21]]			; AVX512-NEXT: store <8 x float> [[TMP20]], <8 x float>* [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP23:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512-NEXT: store <8 x float> [[TMP22]], <8 x float>* [[TMP23]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%3 = load float, float* %1, align 4, !tbaa !2			%3 = load float, float* %1, align 4, !tbaa !2
	%4 = getelementptr inbounds float, float* %1, i64 4			%4 = getelementptr inbounds float, float* %1, i64 4
	%5 = load float, float* %4, align 4, !tbaa !2			%5 = load float, float* %4, align 4, !tbaa !2
	%6 = fdiv float %3, %5			%6 = fdiv float %3, %5
	%7 = getelementptr inbounds float, float* %0, i64 1			%7 = getelementptr inbounds float, float* %0, i64 1
	store float %6, float* %0, align 4, !tbaa !2			store float %6, float* %0, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	%"struct.std::array" = type { [32 x i8] }			%"struct.std::array" = type { [32 x i8] }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {			define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {
	; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(			; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]
	; CHECK: if.then22.i:			; CHECK: if.then22.i:
	; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1			; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1
	; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]			; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0
				; CHECK-NEXT: [[SHR_I_I:%.*]] = lshr i32 [[CONV31_I]], 1
	; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1
				; CHECK-NEXT: [[SHR_1_I_I:%.*]] = lshr i32 [[CONV31_I]], 2
	; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2
				; CHECK-NEXT: [[SHR_2_I_I:%.*]] = lshr i32 [[CONV31_I]], 3
	; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
				; CHECK-NEXT: [[TMP5:%.*]] = lshr <4 x i32> [[TMP4]], <i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[CONV31_I]], i32 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[CONV31_I]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[CONV31_I]], i32 6
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[CONV31_I]], i32 7
	; CHECK-NEXT: [[TMP9:%.*]] = lshr <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8			; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8
	; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9			; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
				; CHECK-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP4]], <i32 8, i32 9, i32 10, i32 11>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[CONV31_I]], i32 2
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3
	; CHECK-NEXT: [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12			; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13			; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14			; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15			; CHECK-NEXT: [[TMP7:%.*]] = lshr <4 x i32> [[TMP4]], <i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP16]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[SHR_I_I]], i32 1
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SHR_1_I_I]], i32 2
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP18]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[SHR_2_I_I]], i32 3
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP20]], i32 3			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i32> [[TMP12]], <16 x i32> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP22]], i32 4			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <16 x i32> [[TMP14]], <16 x i32> [[TMP15]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP24]], i32 5			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <16 x i32> [[TMP16]], <16 x i32> [[TMP17]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5			; CHECK-NEXT: [[TMP19:%.*]] = trunc <16 x i32> [[TMP18]] to <16 x i8>
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <16 x i32> [[TMP25]], i32 [[TMP26]], i32 6			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x i32> [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <16 x i32> [[TMP27]], i32 [[TMP28]], i32 7			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP6]], i32 3
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <16 x i32> [[TMP29]], i32 [[TMP30]], i32 8			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP6]], i32 2
	; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <16 x i32> [[TMP31]], i32 [[TMP32]], i32 9			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP5]], i32 3
	; CHECK-NEXT: [[TMP35:%.*]] = insertelement <16 x i32> [[TMP33]], i32 [[TMP34]], i32 10			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i32> [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2			; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i32> [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <16 x i32> [[TMP35]], i32 [[TMP36]], i32 11			; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i32> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3			; CHECK-NEXT: [[TMP31:%.*]] = and <16 x i8> [[TMP19]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP39:%.*]] = insertelement <16 x i32> [[TMP37]], i32 [[TMP38]], i32 12
	; CHECK-NEXT: [[TMP40:%.*]] = insertelement <16 x i32> [[TMP39]], i32 [[SHR_12_I_I]], i32 13
	; CHECK-NEXT: [[TMP41:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[SHR_13_I_I]], i32 14
	; CHECK-NEXT: [[TMP42:%.*]] = insertelement <16 x i32> [[TMP41]], i32 [[SHR_14_I_I]], i32 15
	; CHECK-NEXT: [[TMP43:%.*]] = trunc <16 x i32> [[TMP42]] to <16 x i8>
	; CHECK-NEXT: [[TMP44:%.*]] = and <16 x i8> [[TMP43]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15			; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP45:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP32:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP44]], <16 x i8>* [[TMP45]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP31]], <16 x i8>* [[TMP32]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
		RKSimonUnsubmitted Done Reply Inline Actions Add these back - you've lost SSE2/SLM test coverage. RKSimon: Add these back - you've lost SSE2/SLM test coverage.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, fixed anton-afanasyev: Ok, fixed
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE-LABEL: @loadext_2i8_to_2i64(		; SSE-LABEL: @loadext_2i8_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i32(		; SSE-LABEL: @loadext_4i8_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <4 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i32		%x0 = sext i8 %i0 to i32
%x1 = sext i8 %i1 to i32		%x1 = sext i8 %i1 to i32
%x2 = sext i8 %i2 to i32		%x2 = sext i8 %i2 to i32
%x3 = sext i8 %i3 to i32		%x3 = sext i8 %i3 to i32
%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0		%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE-LABEL: @loadext_4i8_to_4i64(		; SSE-LABEL: @loadext_4i8_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; AVX-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%x2 = sext i8 %i2 to i64		%x2 = sext i8 %i2 to i64
%x3 = sext i8 %i3 to i64		%x3 = sext i8 %i3 to i64
%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i16(		; SSE-LABEL: @loadext_8i8_to_8i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i16> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i16> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i16> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i16> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i32(		; SSE-LABEL: @loadext_8i8_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; SSE2-LABEL: @loadext_16i8_to_16i16(		; SSE-LABEL: @loadext_16i8_to_16i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE-NEXT: ret <16 x i16> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SSE2-NEXT: ret <16 x i16> [[V15]]
;
; SLM-LABEL: @loadext_16i8_to_16i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[X8:%.*]] = sext i8 [[I8]] to i16
; SLM-NEXT: [[X9:%.*]] = sext i8 [[I9]] to i16
; SLM-NEXT: [[X10:%.*]] = sext i8 [[I10]] to i16
; SLM-NEXT: [[X11:%.*]] = sext i8 [[I11]] to i16
; SLM-NEXT: [[X12:%.*]] = sext i8 [[I12]] to i16
; SLM-NEXT: [[X13:%.*]] = sext i8 [[I13]] to i16
; SLM-NEXT: [[X14:%.*]] = sext i8 [[I14]] to i16
; SLM-NEXT: [[X15:%.*]] = sext i8 [[I15]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; AVX-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; AVX-NEXT: ret <16 x i16> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; AVX-NEXT: ret <16 x i16> [[V15]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE-LABEL: @loadext_2i16_to_2i64(		; SSE-LABEL: @loadext_2i16_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i32(		; SSE-LABEL: @loadext_4i16_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <4 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i32		%x0 = sext i16 %i0 to i32
%x1 = sext i16 %i1 to i32		%x1 = sext i16 %i1 to i32
%x2 = sext i16 %i2 to i32		%x2 = sext i16 %i2 to i32
%x3 = sext i16 %i3 to i32		%x3 = sext i16 %i3 to i32
%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0		%v0 = insertelement <4 x i32> poison, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64		; SSE2-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0		; SSE2-NEXT: [[V11:%.*]] = shufflevector <4 x i64> poison, <4 x i64> [[TMP4]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V11]], i64 [[X2]], i32 2
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: ret <4 x i64> [[V3]]
; SSE-NEXT: ret <4 x i64> [[V3]]		;
		; SLM-LABEL: @loadext_4i16_to_4i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; SLM-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; AVX-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%x2 = sext i16 %i2 to i64		%x2 = sext i16 %i2 to i64
%x3 = sext i16 %i3 to i64		%x3 = sext i16 %i3 to i64
%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <4 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; SSE2-LABEL: @loadext_8i16_to_8i32(		; SSE-LABEL: @loadext_8i16_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i16 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i16 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i16 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i16 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%p4 = getelementptr inbounds i16, i16* %p0, i64 4		%p4 = getelementptr inbounds i16, i16* %p0, i64 4
%p5 = getelementptr inbounds i16, i16* %p0, i64 5		%p5 = getelementptr inbounds i16, i16* %p0, i64 5
%p6 = getelementptr inbounds i16, i16* %p0, i64 6		%p6 = getelementptr inbounds i16, i16* %p0, i64 6
%p7 = getelementptr inbounds i16, i16* %p0, i64 7		%p7 = getelementptr inbounds i16, i16* %p0, i64 7
Show All 26 Lines

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE-LABEL: @loadext_2i32_to_2i64(		; SSE-LABEL: @loadext_2i32_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = sext i32 %i0 to i64		%x0 = sext i32 %i0 to i64
%x1 = sext i32 %i1 to i64		%x1 = sext i32 %i1 to i64
%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0		%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE-LABEL: @loadext_4i32_to_4i64(		; SSE-LABEL: @loadext_4i32_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; AVX-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SLM
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
		RKSimonUnsubmitted Done Reply Inline Actions Add these back - you've lost SSE2/SLM test coverage. RKSimon: Add these back - you've lost SSE2/SLM test coverage.
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, fixed anton-afanasyev: Ok, fixed
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+avx512bw -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX

;		;
; vXi8		; vXi8
;		;

define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {		define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
; SSE-LABEL: @loadext_2i8_to_2i64(		; SSE-LABEL: @loadext_2i8_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i8_to_2i64(		; AVX-LABEL: @loadext_2i8_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {		define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
; SSE2-LABEL: @loadext_4i8_to_4i32(		; SSE-LABEL: @loadext_4i8_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <4 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i8_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i32(		; AVX-LABEL: @loadext_4i8_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i32		%x0 = sext i8 %i0 to i32
%x1 = sext i8 %i1 to i32		%x1 = sext i8 %i1 to i32
%x2 = sext i8 %i2 to i32		%x2 = sext i8 %i2 to i32
%x3 = sext i8 %i3 to i32		%x3 = sext i8 %i3 to i32
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {		define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
; SSE-LABEL: @loadext_4i8_to_4i64(		; SSE-LABEL: @loadext_4i8_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; SSE-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; SSE-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i8_to_4i64(		; AVX-LABEL: @loadext_4i8_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i8> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%i0 = load i8, i8* %p0, align 1		%i0 = load i8, i8* %p0, align 1
%i1 = load i8, i8* %p1, align 1		%i1 = load i8, i8* %p1, align 1
%i2 = load i8, i8* %p2, align 1		%i2 = load i8, i8* %p2, align 1
%i3 = load i8, i8* %p3, align 1		%i3 = load i8, i8* %p3, align 1
%x0 = sext i8 %i0 to i64		%x0 = sext i8 %i0 to i64
%x1 = sext i8 %i1 to i64		%x1 = sext i8 %i1 to i64
%x2 = sext i8 %i2 to i64		%x2 = sext i8 %i2 to i64
%x3 = sext i8 %i3 to i64		%x3 = sext i8 %i3 to i64
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {		define <8 x i16> @loadext_8i8_to_8i16(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i16(		; SSE-LABEL: @loadext_8i8_to_8i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i16> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i16> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: ret <8 x i16> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i16(		; AVX-LABEL: @loadext_8i8_to_8i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i16>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i16> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i16> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4		%v4 = insertelement <8 x i16> %v3, i16 %x4, i32 4
%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5		%v5 = insertelement <8 x i16> %v4, i16 %x5, i32 5
%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6		%v6 = insertelement <8 x i16> %v5, i16 %x6, i32 6
%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7		%v7 = insertelement <8 x i16> %v6, i16 %x7, i32 7
ret <8 x i16> %v7		ret <8 x i16> %v7
}		}

define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {		define <8 x i32> @loadext_8i8_to_8i32(i8* %p0) {
; SSE2-LABEL: @loadext_8i8_to_8i32(		; SSE-LABEL: @loadext_8i8_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i8_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i8_to_8i32(		; AVX-LABEL: @loadext_8i8_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i8> [[TMP2]] to <8 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
Show All 20 Lines	;
%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4		%v4 = insertelement <8 x i32> %v3, i32 %x4, i32 4
%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5		%v5 = insertelement <8 x i32> %v4, i32 %x5, i32 5
%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6		%v6 = insertelement <8 x i32> %v5, i32 %x6, i32 6
%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7		%v7 = insertelement <8 x i32> %v6, i32 %x7, i32 7
ret <8 x i32> %v7		ret <8 x i32> %v7
}		}

define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {		define <16 x i16> @loadext_16i8_to_16i16(i8* %p0) {
; SSE2-LABEL: @loadext_16i8_to_16i16(		; SSE-LABEL: @loadext_16i8_to_16i16(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SSE2-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; SSE-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SSE2-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; SSE-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SSE2-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; SSE-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; SSE-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; SSE-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; SSE-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; SSE-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; SSE-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; SSE-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; SSE-NEXT: ret <16 x i16> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; SSE2-NEXT: ret <16 x i16> [[V15]]
;
; SLM-LABEL: @loadext_16i8_to_16i16(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i8 [[I0]] to i16
; SLM-NEXT: [[X1:%.*]] = sext i8 [[I1]] to i16
; SLM-NEXT: [[X2:%.*]] = sext i8 [[I2]] to i16
; SLM-NEXT: [[X3:%.*]] = sext i8 [[I3]] to i16
; SLM-NEXT: [[X4:%.*]] = sext i8 [[I4]] to i16
; SLM-NEXT: [[X5:%.*]] = sext i8 [[I5]] to i16
; SLM-NEXT: [[X6:%.*]] = sext i8 [[I6]] to i16
; SLM-NEXT: [[X7:%.*]] = sext i8 [[I7]] to i16
; SLM-NEXT: [[X8:%.*]] = sext i8 [[I8]] to i16
; SLM-NEXT: [[X9:%.*]] = sext i8 [[I9]] to i16
; SLM-NEXT: [[X10:%.*]] = sext i8 [[I10]] to i16
; SLM-NEXT: [[X11:%.*]] = sext i8 [[I11]] to i16
; SLM-NEXT: [[X12:%.*]] = sext i8 [[I12]] to i16
; SLM-NEXT: [[X13:%.*]] = sext i8 [[I13]] to i16
; SLM-NEXT: [[X14:%.*]] = sext i8 [[I14]] to i16
; SLM-NEXT: [[X15:%.*]] = sext i8 [[I15]] to i16
; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
; SLM-NEXT: ret <16 x i16> [[V15]]
;		;
; AVX-LABEL: @loadext_16i8_to_16i16(		; AVX-LABEL: @loadext_16i8_to_16i16(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8		; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9		; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10		; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11		; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12		; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13		; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14		; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15		; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>		; AVX-NEXT: [[TMP3:%.*]] = sext <16 x i8> [[TMP2]] to <16 x i16>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0		; AVX-NEXT: ret <16 x i16> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
; AVX-NEXT: ret <16 x i16> [[V15]]
;		;
%p1 = getelementptr inbounds i8, i8* %p0, i64 1		%p1 = getelementptr inbounds i8, i8* %p0, i64 1
%p2 = getelementptr inbounds i8, i8* %p0, i64 2		%p2 = getelementptr inbounds i8, i8* %p0, i64 2
%p3 = getelementptr inbounds i8, i8* %p0, i64 3		%p3 = getelementptr inbounds i8, i8* %p0, i64 3
%p4 = getelementptr inbounds i8, i8* %p0, i64 4		%p4 = getelementptr inbounds i8, i8* %p0, i64 4
%p5 = getelementptr inbounds i8, i8* %p0, i64 5		%p5 = getelementptr inbounds i8, i8* %p0, i64 5
%p6 = getelementptr inbounds i8, i8* %p0, i64 6		%p6 = getelementptr inbounds i8, i8* %p0, i64 6
%p7 = getelementptr inbounds i8, i8* %p0, i64 7		%p7 = getelementptr inbounds i8, i8* %p0, i64 7
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

;		;
; vXi16		; vXi16
;		;

define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {		define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
; SSE-LABEL: @loadext_2i16_to_2i64(		; SSE-LABEL: @loadext_2i16_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i16_to_2i64(		; AVX-LABEL: @loadext_2i16_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {		define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
; SSE2-LABEL: @loadext_4i16_to_4i32(		; SSE-LABEL: @loadext_4i16_to_4i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <4 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: ret <4 x i32> [[V3]]
;
; SLM-LABEL: @loadext_4i16_to_4i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: ret <4 x i32> [[V3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i32(		; AVX-LABEL: @loadext_4i16_to_4i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i32> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i32		%x0 = sext i16 %i0 to i32
%x1 = sext i16 %i1 to i32		%x1 = sext i16 %i1 to i32
%x2 = sext i16 %i2 to i32		%x2 = sext i16 %i2 to i32
%x3 = sext i16 %i3 to i32		%x3 = sext i16 %i3 to i32
%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0		%v0 = insertelement <4 x i32> undef, i32 %x0, i32 0
%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1		%v1 = insertelement <4 x i32> %v0, i32 %x1, i32 1
%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2		%v2 = insertelement <4 x i32> %v1, i32 %x2, i32 2
%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3		%v3 = insertelement <4 x i32> %v2, i32 %x3, i32 3
ret <4 x i32> %v3		ret <4 x i32> %v3
}		}

define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {		define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
; SSE-LABEL: @loadext_4i16_to_4i64(		; SSE2-LABEL: @loadext_4i16_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1		; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
; SSE-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1		; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1		; SSE2-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SSE-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1		; SSE2-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i64		; SSE2-NEXT: [[TMP3:%.*]] = sext <2 x i16> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i64		; SSE2-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i64		; SSE2-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i64		; SSE2-NEXT: [[TMP4:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0		; SSE2-NEXT: [[V11:%.*]] = shufflevector <4 x i64> undef, <4 x i64> [[TMP4]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1		; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V11]], i64 [[X2]], i32 2
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2		; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3		; SSE2-NEXT: ret <4 x i64> [[V3]]
; SSE-NEXT: ret <4 x i64> [[V3]]		;
		; SLM-LABEL: @loadext_4i16_to_4i64(
		; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
		; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
		; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
		; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
		; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
		; SLM-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
		; SLM-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX-LABEL: @loadext_4i16_to_4i64(		; AVX-LABEL: @loadext_4i16_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i16> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%i0 = load i16, i16* %p0, align 1		%i0 = load i16, i16* %p0, align 1
%i1 = load i16, i16* %p1, align 1		%i1 = load i16, i16* %p1, align 1
%i2 = load i16, i16* %p2, align 1		%i2 = load i16, i16* %p2, align 1
%i3 = load i16, i16* %p3, align 1		%i3 = load i16, i16* %p3, align 1
%x0 = sext i16 %i0 to i64		%x0 = sext i16 %i0 to i64
%x1 = sext i16 %i1 to i64		%x1 = sext i16 %i1 to i64
%x2 = sext i16 %i2 to i64		%x2 = sext i16 %i2 to i64
%x3 = sext i16 %i3 to i64		%x3 = sext i16 %i3 to i64
%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <4 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <4 x i64> %v0, i64 %x1, i32 1
%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2		%v2 = insertelement <4 x i64> %v1, i64 %x2, i32 2
%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3		%v3 = insertelement <4 x i64> %v2, i64 %x3, i32 3
ret <4 x i64> %v3		ret <4 x i64> %v3
}		}

define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {		define <8 x i32> @loadext_8i16_to_8i32(i16* %p0) {
; SSE2-LABEL: @loadext_8i16_to_8i32(		; SSE-LABEL: @loadext_8i16_to_8i32(
; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; SSE-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; SSE-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; SSE-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; SSE-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; SSE-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; SSE2-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; SSE-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; SSE-NEXT: ret <8 x i32> [[TMP3]]
; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; SSE2-NEXT: ret <8 x i32> [[V7]]
;
; SLM-LABEL: @loadext_8i16_to_8i32(
; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1
; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1
; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1
; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1
; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
; SLM-NEXT: [[X0:%.*]] = sext i16 [[I0]] to i32
; SLM-NEXT: [[X1:%.*]] = sext i16 [[I1]] to i32
; SLM-NEXT: [[X2:%.*]] = sext i16 [[I2]] to i32
; SLM-NEXT: [[X3:%.*]] = sext i16 [[I3]] to i32
; SLM-NEXT: [[X4:%.*]] = sext i16 [[I4]] to i32
; SLM-NEXT: [[X5:%.*]] = sext i16 [[I5]] to i32
; SLM-NEXT: [[X6:%.*]] = sext i16 [[I6]] to i32
; SLM-NEXT: [[X7:%.*]] = sext i16 [[I7]] to i32
; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
; SLM-NEXT: ret <8 x i32> [[V7]]
;		;
; AVX-LABEL: @loadext_8i16_to_8i32(		; AVX-LABEL: @loadext_8i16_to_8i32(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4		; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5		; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6		; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7		; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>		; AVX-NEXT: [[TMP3:%.*]] = sext <8 x i16> [[TMP2]] to <8 x i32>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0		; AVX-NEXT: ret <8 x i32> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
; AVX-NEXT: ret <8 x i32> [[V7]]
;		;
%p1 = getelementptr inbounds i16, i16* %p0, i64 1		%p1 = getelementptr inbounds i16, i16* %p0, i64 1
%p2 = getelementptr inbounds i16, i16* %p0, i64 2		%p2 = getelementptr inbounds i16, i16* %p0, i64 2
%p3 = getelementptr inbounds i16, i16* %p0, i64 3		%p3 = getelementptr inbounds i16, i16* %p0, i64 3
%p4 = getelementptr inbounds i16, i16* %p0, i64 4		%p4 = getelementptr inbounds i16, i16* %p0, i64 4
%p5 = getelementptr inbounds i16, i16* %p0, i64 5		%p5 = getelementptr inbounds i16, i16* %p0, i64 5
%p6 = getelementptr inbounds i16, i16* %p0, i64 6		%p6 = getelementptr inbounds i16, i16* %p0, i64 6
%p7 = getelementptr inbounds i16, i16* %p0, i64 7		%p7 = getelementptr inbounds i16, i16* %p0, i64 7
Show All 26 Lines

;		;
; vXi32		; vXi32
;		;

define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {		define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
; SSE-LABEL: @loadext_2i32_to_2i64(		; SSE-LABEL: @loadext_2i32_to_2i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64		; SSE-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64		; SSE-NEXT: ret <2 x i64> [[TMP3]]
; SSE-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: ret <2 x i64> [[V1]]
;		;
; AVX-LABEL: @loadext_2i32_to_2i64(		; AVX-LABEL: @loadext_2i32_to_2i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <2 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: ret <2 x i64> [[V1]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%x0 = sext i32 %i0 to i64		%x0 = sext i32 %i0 to i64
%x1 = sext i32 %i1 to i64		%x1 = sext i32 %i1 to i64
%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0		%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1		%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
ret <2 x i64> %v1		ret <2 x i64> %v1
}		}

define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {		define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
; SSE-LABEL: @loadext_4i32_to_4i64(		; SSE-LABEL: @loadext_4i32_to_4i64(
; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; SSE-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; SSE-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; SSE-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; SSE-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1		; SSE-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; SSE-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; SSE-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1		; SSE-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; SSE-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[X0:%.*]] = sext i32 [[I0]] to i64
; SSE-NEXT: [[X1:%.*]] = sext i32 [[I1]] to i64
; SSE-NEXT: [[X2:%.*]] = sext i32 [[I2]] to i64
; SSE-NEXT: [[X3:%.*]] = sext i32 [[I3]] to i64
; SSE-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
; SSE-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
; SSE-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
; SSE-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
; SSE-NEXT: ret <4 x i64> [[V3]]
;		;
; AVX-LABEL: @loadext_4i32_to_4i64(		; AVX-LABEL: @loadext_4i32_to_4i64(
; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1		; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2		; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3		; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*		; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1		; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>		; AVX-NEXT: [[TMP3:%.*]] = sext <4 x i32> [[TMP2]] to <4 x i64>
; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0		; AVX-NEXT: ret <4 x i64> [[TMP3]]
; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
; AVX-NEXT: ret <4 x i64> [[V3]]
;		;
%p1 = getelementptr inbounds i32, i32* %p0, i64 1		%p1 = getelementptr inbounds i32, i32* %p0, i64 1
%p2 = getelementptr inbounds i32, i32* %p0, i64 2		%p2 = getelementptr inbounds i32, i32* %p0, i64 2
%p3 = getelementptr inbounds i32, i32* %p0, i64 3		%p3 = getelementptr inbounds i32, i32* %p0, i64 3
%i0 = load i32, i32* %p0, align 1		%i0 = load i32, i32* %p0, align 1
%i1 = load i32, i32* %p1, align 1		%i1 = load i32, i32* %p1, align 1
%i2 = load i32, i32* %p2, align 1		%i2 = load i32, i32* %p2, align 1
%i3 = load i32, i32* %p3, align 1		%i3 = load i32, i32* %p3, align 1
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/sign-extend-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define <4 x i32> @sign_extend_v_v(<4 x i16> %lhs) {			define <4 x i32> @sign_extend_v_v(<4 x i16> %lhs) {
	; CHECK-LABEL: @sign_extend_v_v(			; CHECK-LABEL: @sign_extend_v_v(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = sext <4 x i16> [[LHS:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP0:%.]] = sext <4 x i16> [[LHS:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0			; CHECK-NEXT: ret <4 x i32> [[TMP0]]
	; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
	; CHECK-NEXT: [[VECINIT3:%.*]] = insertelement <4 x i32> [[VECINIT]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP0]], i32 2
	; CHECK-NEXT: [[VECINIT6:%.*]] = insertelement <4 x i32> [[VECINIT3]], i32 [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP0]], i32 3
	; CHECK-NEXT: [[VECINIT9:%.*]] = insertelement <4 x i32> [[VECINIT6]], i32 [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[VECINIT9]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i16> %lhs, i32 0			%vecext = extractelement <4 x i16> %lhs, i32 0
	%conv = sext i16 %vecext to i32			%conv = sext i16 %vecext to i32
	%vecinit = insertelement <4 x i32> poison, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> poison, i32 %conv, i32 0
	%vecext1 = extractelement <4 x i16> %lhs, i32 1			%vecext1 = extractelement <4 x i16> %lhs, i32 1
	%conv2 = sext i16 %vecext1 to i32			%conv2 = sext i16 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <4 x i16> %lhs, i32 2			%vecext4 = extractelement <4 x i16> %lhs, i32 2
	%conv5 = sext i16 %vecext4 to i32			%conv5 = sext i16 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <4 x i16> %lhs, i32 3			%vecext7 = extractelement <4 x i16> %lhs, i32 3
	%conv8 = sext i16 %vecext7 to i32			%conv8 = sext i16 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <4 x i16> @truncate_v_v(<4 x i32> %lhs) {			define <4 x i16> @truncate_v_v(<4 x i32> %lhs) {
	; CHECK-LABEL: @truncate_v_v(			; CHECK-LABEL: @truncate_v_v(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[LHS:%.]] to <4 x i16>			; CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[LHS:%.]] to <4 x i16>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i16> [[TMP0]], i32 0			; CHECK-NEXT: ret <4 x i16> [[TMP0]]
	; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i16> poison, i16 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i16> [[TMP0]], i32 1
	; CHECK-NEXT: [[VECINIT3:%.*]] = insertelement <4 x i16> [[VECINIT]], i16 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP0]], i32 2
	; CHECK-NEXT: [[VECINIT6:%.*]] = insertelement <4 x i16> [[VECINIT3]], i16 [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i16> [[TMP0]], i32 3
	; CHECK-NEXT: [[VECINIT9:%.*]] = insertelement <4 x i16> [[VECINIT6]], i16 [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x i16> [[VECINIT9]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %lhs, i32 0			%vecext = extractelement <4 x i32> %lhs, i32 0
	%conv = trunc i32 %vecext to i16			%conv = trunc i32 %vecext to i16
	%vecinit = insertelement <4 x i16> poison, i16 %conv, i32 0			%vecinit = insertelement <4 x i16> poison, i16 %conv, i32 0
	%vecext1 = extractelement <4 x i32> %lhs, i32 1			%vecext1 = extractelement <4 x i32> %lhs, i32 1
	%conv2 = trunc i32 %vecext1 to i16			%conv2 = trunc i32 %vecext1 to i16
	%vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1			%vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1
	%vecext4 = extractelement <4 x i32> %lhs, i32 2			%vecext4 = extractelement <4 x i32> %lhs, i32 2
	%conv5 = trunc i32 %vecext4 to i16			%conv5 = trunc i32 %vecext4 to i16
	%vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2			%vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2
	%vecext7 = extractelement <4 x i32> %lhs, i32 3			%vecext7 = extractelement <4 x i32> %lhs, i32 3
	%conv8 = trunc i32 %vecext7 to i16			%conv8 = trunc i32 %vecext7 to i16
	%vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3			%vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3
	ret <4 x i16> %vecinit9			ret <4 x i16> %vecinit9
	}			}

llvm/test/Transforms/SLPVectorizer/X86/sign-extend.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define <4 x i32> @sign_extend_v_v(<4 x i16> %lhs) {			define <4 x i32> @sign_extend_v_v(<4 x i16> %lhs) {
	; CHECK-LABEL: @sign_extend_v_v(			; CHECK-LABEL: @sign_extend_v_v(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = sext <4 x i16> [[LHS:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP0:%.]] = sext <4 x i16> [[LHS:%.]] to <4 x i32>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[TMP0]], i32 0			; CHECK-NEXT: ret <4 x i32> [[TMP0]]
	; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1
	; CHECK-NEXT: [[VECINIT3:%.*]] = insertelement <4 x i32> [[VECINIT]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP0]], i32 2
	; CHECK-NEXT: [[VECINIT6:%.*]] = insertelement <4 x i32> [[VECINIT3]], i32 [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP0]], i32 3
	; CHECK-NEXT: [[VECINIT9:%.*]] = insertelement <4 x i32> [[VECINIT6]], i32 [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x i32> [[VECINIT9]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i16> %lhs, i32 0			%vecext = extractelement <4 x i16> %lhs, i32 0
	%conv = sext i16 %vecext to i32			%conv = sext i16 %vecext to i32
	%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
	%vecext1 = extractelement <4 x i16> %lhs, i32 1			%vecext1 = extractelement <4 x i16> %lhs, i32 1
	%conv2 = sext i16 %vecext1 to i32			%conv2 = sext i16 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <4 x i16> %lhs, i32 2			%vecext4 = extractelement <4 x i16> %lhs, i32 2
	%conv5 = sext i16 %vecext4 to i32			%conv5 = sext i16 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <4 x i16> %lhs, i32 3			%vecext7 = extractelement <4 x i16> %lhs, i32 3
	%conv8 = sext i16 %vecext7 to i32			%conv8 = sext i16 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <4 x i16> @truncate_v_v(<4 x i32> %lhs) {			define <4 x i16> @truncate_v_v(<4 x i32> %lhs) {
	; CHECK-LABEL: @truncate_v_v(			; CHECK-LABEL: @truncate_v_v(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[LHS:%.]] to <4 x i16>			; CHECK-NEXT: [[TMP0:%.]] = trunc <4 x i32> [[LHS:%.]] to <4 x i16>
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i16> [[TMP0]], i32 0			; CHECK-NEXT: ret <4 x i16> [[TMP0]]
	; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i16> undef, i16 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i16> [[TMP0]], i32 1
	; CHECK-NEXT: [[VECINIT3:%.*]] = insertelement <4 x i16> [[VECINIT]], i16 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP0]], i32 2
	; CHECK-NEXT: [[VECINIT6:%.*]] = insertelement <4 x i16> [[VECINIT3]], i16 [[TMP3]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i16> [[TMP0]], i32 3
	; CHECK-NEXT: [[VECINIT9:%.*]] = insertelement <4 x i16> [[VECINIT6]], i16 [[TMP4]], i32 3
	; CHECK-NEXT: ret <4 x i16> [[VECINIT9]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %lhs, i32 0			%vecext = extractelement <4 x i32> %lhs, i32 0
	%conv = trunc i32 %vecext to i16			%conv = trunc i32 %vecext to i16
	%vecinit = insertelement <4 x i16> undef, i16 %conv, i32 0			%vecinit = insertelement <4 x i16> undef, i16 %conv, i32 0
	%vecext1 = extractelement <4 x i32> %lhs, i32 1			%vecext1 = extractelement <4 x i32> %lhs, i32 1
	%conv2 = trunc i32 %vecext1 to i16			%conv2 = trunc i32 %vecext1 to i16
	%vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1			%vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1
	%vecext4 = extractelement <4 x i32> %lhs, i32 2			%vecext4 = extractelement <4 x i32> %lhs, i32 2
	%conv5 = trunc i32 %vecext4 to i16			%conv5 = trunc i32 %vecext4 to i16
	%vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2			%vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2
	%vecext7 = extractelement <4 x i32> %lhs, i32 3			%vecext7 = extractelement <4 x i32> %lhs, i32 3
	%conv8 = trunc i32 %vecext7 to i16			%conv8 = trunc i32 %vecext7 to i16
	%vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3			%vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3
	ret <4 x i16> %vecinit9			ret <4 x i16> %vecinit9
	}			}

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

Show First 20 Lines • Show All 1,277 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; SITOFP BUILDVECTOR		; SITOFP BUILDVECTOR
;		;

define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {		define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
; CHECK-LABEL: @sitofp_4xi32_4f64(		; SSE-LABEL: @sitofp_4xi32_4f64(
; CHECK-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to double		; SSE-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to double
; CHECK-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to double		; SSE-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to double
; CHECK-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to double		; SSE-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to double
; CHECK-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to double		; SSE-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to double
; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x double> poison, double [[CVT0]], i32 0		; SSE-NEXT: [[RES0:%.*]] = insertelement <4 x double> poison, double [[CVT0]], i32 0
; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x double> [[RES0]], double [[CVT1]], i32 1		; SSE-NEXT: [[RES1:%.*]] = insertelement <4 x double> [[RES0]], double [[CVT1]], i32 1
; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x double> [[RES1]], double [[CVT2]], i32 2		; SSE-NEXT: [[RES2:%.*]] = insertelement <4 x double> [[RES1]], double [[CVT2]], i32 2
; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x double> [[RES2]], double [[CVT3]], i32 3		; SSE-NEXT: [[RES3:%.*]] = insertelement <4 x double> [[RES2]], double [[CVT3]], i32 3
; CHECK-NEXT: ret <4 x double> [[RES3]]		; SSE-NEXT: ret <4 x double> [[RES3]]
		;
		; AVX-LABEL: @sitofp_4xi32_4f64(
		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
		; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
		; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
		; AVX-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x double>
		; AVX-NEXT: ret <4 x double> [[TMP5]]
;		;
%cvt0 = sitofp i32 %a0 to double		%cvt0 = sitofp i32 %a0 to double
%cvt1 = sitofp i32 %a1 to double		%cvt1 = sitofp i32 %a1 to double
%cvt2 = sitofp i32 %a2 to double		%cvt2 = sitofp i32 %a2 to double
%cvt3 = sitofp i32 %a3 to double		%cvt3 = sitofp i32 %a3 to double
%res0 = insertelement <4 x double> poison, double %cvt0, i32 0		%res0 = insertelement <4 x double> poison, double %cvt0, i32 0
%res1 = insertelement <4 x double> %res0, double %cvt1, i32 1		%res1 = insertelement <4 x double> %res0, double %cvt1, i32 1
%res2 = insertelement <4 x double> %res1, double %cvt2, i32 2		%res2 = insertelement <4 x double> %res1, double %cvt2, i32 2
%res3 = insertelement <4 x double> %res2, double %cvt3, i32 3		%res3 = insertelement <4 x double> %res2, double %cvt3, i32 3
ret <4 x double> %res3		ret <4 x double> %res3
}		}

define <4 x float> @sitofp_4xi32_4f32(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {		define <4 x float> @sitofp_4xi32_4f32(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
; CHECK-LABEL: @sitofp_4xi32_4f32(		; SSE-LABEL: @sitofp_4xi32_4f32(
; CHECK-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to float		; SSE-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to float
; CHECK-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to float		; SSE-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to float
; CHECK-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to float		; SSE-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to float
; CHECK-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to float		; SSE-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to float
; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x float> poison, float [[CVT0]], i32 0		; SSE-NEXT: [[RES0:%.*]] = insertelement <4 x float> poison, float [[CVT0]], i32 0
; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x float> [[RES0]], float [[CVT1]], i32 1		; SSE-NEXT: [[RES1:%.*]] = insertelement <4 x float> [[RES0]], float [[CVT1]], i32 1
; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x float> [[RES1]], float [[CVT2]], i32 2		; SSE-NEXT: [[RES2:%.*]] = insertelement <4 x float> [[RES1]], float [[CVT2]], i32 2
; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x float> [[RES2]], float [[CVT3]], i32 3		; SSE-NEXT: [[RES3:%.*]] = insertelement <4 x float> [[RES2]], float [[CVT3]], i32 3
; CHECK-NEXT: ret <4 x float> [[RES3]]		; SSE-NEXT: ret <4 x float> [[RES3]]
		;
		; AVX-LABEL: @sitofp_4xi32_4f32(
		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
		; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
		; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
		; AVX-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>
		; AVX-NEXT: ret <4 x float> [[TMP5]]
;		;
%cvt0 = sitofp i32 %a0 to float		%cvt0 = sitofp i32 %a0 to float
%cvt1 = sitofp i32 %a1 to float		%cvt1 = sitofp i32 %a1 to float
%cvt2 = sitofp i32 %a2 to float		%cvt2 = sitofp i32 %a2 to float
%cvt3 = sitofp i32 %a3 to float		%cvt3 = sitofp i32 %a3 to float
%res0 = insertelement <4 x float> poison, float %cvt0, i32 0		%res0 = insertelement <4 x float> poison, float %cvt0, i32 0
%res1 = insertelement <4 x float> %res0, float %cvt1, i32 1		%res1 = insertelement <4 x float> %res0, float %cvt1, i32 1
%res2 = insertelement <4 x float> %res1, float %cvt2, i32 2		%res2 = insertelement <4 x float> %res1, float %cvt2, i32 2
%res3 = insertelement <4 x float> %res2, float %cvt3, i32 3		%res3 = insertelement <4 x float> %res2, float %cvt3, i32 3
ret <4 x float> %res3		ret <4 x float> %res3
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

Show First 20 Lines • Show All 1,277 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; SITOFP BUILDVECTOR		; SITOFP BUILDVECTOR
;		;

define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {		define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
; CHECK-LABEL: @sitofp_4xi32_4f64(		; SSE-LABEL: @sitofp_4xi32_4f64(
; CHECK-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to double		; SSE-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to double
; CHECK-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to double		; SSE-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to double
; CHECK-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to double		; SSE-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to double
; CHECK-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to double		; SSE-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to double
; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x double> undef, double [[CVT0]], i32 0		; SSE-NEXT: [[RES0:%.*]] = insertelement <4 x double> undef, double [[CVT0]], i32 0
; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x double> [[RES0]], double [[CVT1]], i32 1		; SSE-NEXT: [[RES1:%.*]] = insertelement <4 x double> [[RES0]], double [[CVT1]], i32 1
; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x double> [[RES1]], double [[CVT2]], i32 2		; SSE-NEXT: [[RES2:%.*]] = insertelement <4 x double> [[RES1]], double [[CVT2]], i32 2
; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x double> [[RES2]], double [[CVT3]], i32 3		; SSE-NEXT: [[RES3:%.*]] = insertelement <4 x double> [[RES2]], double [[CVT3]], i32 3
; CHECK-NEXT: ret <4 x double> [[RES3]]		; SSE-NEXT: ret <4 x double> [[RES3]]
		;
		; AVX-LABEL: @sitofp_4xi32_4f64(
		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
		; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
		; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
		; AVX-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x double>
		; AVX-NEXT: ret <4 x double> [[TMP5]]
;		;
%cvt0 = sitofp i32 %a0 to double		%cvt0 = sitofp i32 %a0 to double
%cvt1 = sitofp i32 %a1 to double		%cvt1 = sitofp i32 %a1 to double
%cvt2 = sitofp i32 %a2 to double		%cvt2 = sitofp i32 %a2 to double
%cvt3 = sitofp i32 %a3 to double		%cvt3 = sitofp i32 %a3 to double
%res0 = insertelement <4 x double> undef, double %cvt0, i32 0		%res0 = insertelement <4 x double> undef, double %cvt0, i32 0
%res1 = insertelement <4 x double> %res0, double %cvt1, i32 1		%res1 = insertelement <4 x double> %res0, double %cvt1, i32 1
%res2 = insertelement <4 x double> %res1, double %cvt2, i32 2		%res2 = insertelement <4 x double> %res1, double %cvt2, i32 2
%res3 = insertelement <4 x double> %res2, double %cvt3, i32 3		%res3 = insertelement <4 x double> %res2, double %cvt3, i32 3
ret <4 x double> %res3		ret <4 x double> %res3
}		}

define <4 x float> @sitofp_4xi32_4f32(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {		define <4 x float> @sitofp_4xi32_4f32(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
; CHECK-LABEL: @sitofp_4xi32_4f32(		; SSE-LABEL: @sitofp_4xi32_4f32(
; CHECK-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to float		; SSE-NEXT: [[CVT0:%.]] = sitofp i32 [[A0:%.]] to float
; CHECK-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to float		; SSE-NEXT: [[CVT1:%.]] = sitofp i32 [[A1:%.]] to float
; CHECK-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to float		; SSE-NEXT: [[CVT2:%.]] = sitofp i32 [[A2:%.]] to float
; CHECK-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to float		; SSE-NEXT: [[CVT3:%.]] = sitofp i32 [[A3:%.]] to float
; CHECK-NEXT: [[RES0:%.*]] = insertelement <4 x float> undef, float [[CVT0]], i32 0		; SSE-NEXT: [[RES0:%.*]] = insertelement <4 x float> undef, float [[CVT0]], i32 0
; CHECK-NEXT: [[RES1:%.*]] = insertelement <4 x float> [[RES0]], float [[CVT1]], i32 1		; SSE-NEXT: [[RES1:%.*]] = insertelement <4 x float> [[RES0]], float [[CVT1]], i32 1
; CHECK-NEXT: [[RES2:%.*]] = insertelement <4 x float> [[RES1]], float [[CVT2]], i32 2		; SSE-NEXT: [[RES2:%.*]] = insertelement <4 x float> [[RES1]], float [[CVT2]], i32 2
; CHECK-NEXT: [[RES3:%.*]] = insertelement <4 x float> [[RES2]], float [[CVT3]], i32 3		; SSE-NEXT: [[RES3:%.*]] = insertelement <4 x float> [[RES2]], float [[CVT3]], i32 3
; CHECK-NEXT: ret <4 x float> [[RES3]]		; SSE-NEXT: ret <4 x float> [[RES3]]
		;
		; AVX-LABEL: @sitofp_4xi32_4f32(
		; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
		; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
		; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
		; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
		; AVX-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>
		; AVX-NEXT: ret <4 x float> [[TMP5]]
;		;
%cvt0 = sitofp i32 %a0 to float		%cvt0 = sitofp i32 %a0 to float
%cvt1 = sitofp i32 %a1 to float		%cvt1 = sitofp i32 %a1 to float
%cvt2 = sitofp i32 %a2 to float		%cvt2 = sitofp i32 %a2 to float
%cvt3 = sitofp i32 %a3 to float		%cvt3 = sitofp i32 %a3 to float
%res0 = insertelement <4 x float> undef, float %cvt0, i32 0		%res0 = insertelement <4 x float> undef, float %cvt0, i32 0
%res1 = insertelement <4 x float> %res0, float %cvt1, i32 1		%res1 = insertelement <4 x float> %res0, float %cvt1, i32 1
%res2 = insertelement <4 x float> %res1, float %cvt2, i32 2		%res2 = insertelement <4 x float> %res1, float %cvt2, i32 2
%res3 = insertelement <4 x float> %res2, float %cvt3, i32 3		%res3 = insertelement <4 x float> %res2, float %cvt3, i32 3
ret <4 x float> %res3		ret <4 x float> %res3
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/Transforms/SLPVectorizer/X86/value-bug-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float undef, i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float undef, i32 1
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ undef, [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>			; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], undef			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
	; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], undef			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	; CHECK: bb22.i:			; CHECK: bb22.i:
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: br label [[BB32_I:%.*]]			; CHECK-NEXT: br label [[BB32_I:%.*]]
	; CHECK: bb32.i:			; CHECK: bb32.i:
	; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]			; CHECK-NEXT: [[TMP6:%.*]] = phi <2 x double> [ [[TMP5]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]
	; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]			; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>			; CHECK-NEXT: [[TMP7:%.*]] = fpext <2 x float> [[TMP1]] to <2 x double>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double undef, double 0.000000e+00>			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], <double undef, double 0.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> undef, [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> undef, [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], undef			; CHECK-NEXT: [[TMP10:%.*]] = fadd <2 x double> [[TMP9]], undef
	; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>			; CHECK-NEXT: [[TMP11]] = fptrunc <2 x double> [[TMP10]] to <2 x float>
	; CHECK-NEXT: br label [[BB283]]			; CHECK-NEXT: br label [[BB283]]
	;			;
	bb279:			bb279:
	br label %bb283			br label %bb283

	bb283:			bb283:
	%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]			%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]
	%Av.sroa.5.0 = phi float [ undef, %bb279 ], [ %tmp319, %exit ]			%Av.sroa.5.0 = phi float [ undef, %bb279 ], [ %tmp319, %exit ]
	Show All 40 Lines
	}			}

	; Make sure that we probably handle constant folded vectorized trees. The			; Make sure that we probably handle constant folded vectorized trees. The
	; vectorizer starts at the type (%t2, %t3) and wil constant fold the tree.			; vectorizer starts at the type (%t2, %t3) and wil constant fold the tree.
	; The code that handles insertelement instructions must handle this.			; The code that handles insertelement instructions must handle this.
	define <4 x double> @constant_folding() {			define <4 x double> @constant_folding() {
	; CHECK-LABEL: @constant_folding(			; CHECK-LABEL: @constant_folding(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x double> poison, double 1.000000e+00, i32 1			; CHECK-NEXT: ret <4 x double> <double 2.000000e+00, double 1.000000e+00, double poison, double poison>
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x double> [[I1]], double 2.000000e+00, i32 0
	; CHECK-NEXT: ret <4 x double> [[I2]]
	;			;
	entry:			entry:
	%t0 = fadd double 1.000000e+00 , 0.000000e+00			%t0 = fadd double 1.000000e+00 , 0.000000e+00
	%t1 = fadd double 1.000000e+00 , 1.000000e+00			%t1 = fadd double 1.000000e+00 , 1.000000e+00
	%t2 = fmul double %t0, 1.000000e+00			%t2 = fmul double %t0, 1.000000e+00
	%i1 = insertelement <4 x double> poison, double %t2, i32 1			%i1 = insertelement <4 x double> poison, double %t2, i32 1
	%t3 = fmul double %t1, 1.000000e+00			%t3 = fmul double %t1, 1.000000e+00
	%i2 = insertelement <4 x double> %i1, double %t3, i32 0			%i2 = insertelement <4 x double> %i1, double %t3, i32 0
	ret <4 x double> %i2			ret <4 x double> %i2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/value-bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -mtriple="x86_64-grtev3-linux-gnu" -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; We used to crash on this example because we were building a constant			; We used to crash on this example because we were building a constant
	; expression during vectorization and the vectorizer expects instructions			; expression during vectorization and the vectorizer expects instructions
	; as elements of the vectorized tree.			; as elements of the vectorized tree.
	; PR19621			; PR19621

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb279:			; CHECK-NEXT: bb279:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> poison, float undef, i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float undef, i32 1
	; CHECK-NEXT: br label [[BB283:%.*]]			; CHECK-NEXT: br label [[BB283:%.*]]
	; CHECK: bb283:			; CHECK: bb283:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP13:%.]], [[EXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ undef, [[BB279:%.]] ], [ [[TMP11:%.]], [[EXIT:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ [[TMP1]], [[EXIT]] ]			; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x float> [ undef, [[BB279]] ], [ undef, [[EXIT]] ]
	; CHECK-NEXT: br label [[BB284:%.*]]			; CHECK-NEXT: br label [[BB284:%.*]]
	; CHECK: bb284:			; CHECK: bb284:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>			; CHECK-NEXT: [[TMP2:%.*]] = fpext <2 x float> [[TMP0]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP4]], undef			; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP2]], undef
	; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP5]], undef			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef
	; CHECK-NEXT: br label [[BB21_I:%.*]]			; CHECK-NEXT: br label [[BB21_I:%.*]]
	; CHECK: bb21.i:			; CHECK: bb21.i:
	; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]			; CHECK-NEXT: br i1 undef, label [[BB22_I:%.*]], label [[EXIT]]
	; CHECK: bb22.i:			; CHECK: bb22.i:
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: br label [[BB32_I:%.*]]			; CHECK-NEXT: br label [[BB32_I:%.*]]
	; CHECK: bb32.i:			; CHECK: bb32.i:
	; CHECK-NEXT: [[TMP8:%.*]] = phi <2 x double> [ [[TMP7]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]			; CHECK-NEXT: [[TMP6:%.*]] = phi <2 x double> [ [[TMP5]], [[BB22_I]] ], [ zeroinitializer, [[BB32_I]] ]
	; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]			; CHECK-NEXT: br i1 undef, label [[BB32_I]], label [[BB21_I]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP3]] to <2 x double>			; CHECK-NEXT: [[TMP7:%.*]] = fpext <2 x float> [[TMP1]] to <2 x double>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], <double undef, double 0.000000e+00>			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], <double undef, double 0.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> undef, [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> undef, [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], undef			; CHECK-NEXT: [[TMP10:%.*]] = fadd <2 x double> [[TMP9]], undef
	; CHECK-NEXT: [[TMP13]] = fptrunc <2 x double> [[TMP12]] to <2 x float>			; CHECK-NEXT: [[TMP11]] = fptrunc <2 x double> [[TMP10]] to <2 x float>
	; CHECK-NEXT: br label [[BB283]]			; CHECK-NEXT: br label [[BB283]]
	;			;
	bb279:			bb279:
	br label %bb283			br label %bb283

	bb283:			bb283:
	%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]			%Av.sroa.8.0 = phi float [ undef, %bb279 ], [ %tmp315, %exit ]
	%Av.sroa.5.0 = phi float [ undef, %bb279 ], [ %tmp319, %exit ]			%Av.sroa.5.0 = phi float [ undef, %bb279 ], [ %tmp319, %exit ]
	Show All 40 Lines
	}			}

	; Make sure that we probably handle constant folded vectorized trees. The			; Make sure that we probably handle constant folded vectorized trees. The
	; vectorizer starts at the type (%t2, %t3) and wil constant fold the tree.			; vectorizer starts at the type (%t2, %t3) and wil constant fold the tree.
	; The code that handles insertelement instructions must handle this.			; The code that handles insertelement instructions must handle this.
	define <4 x double> @constant_folding() {			define <4 x double> @constant_folding() {
	; CHECK-LABEL: @constant_folding(			; CHECK-LABEL: @constant_folding(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[I1:%.*]] = insertelement <4 x double> undef, double 1.000000e+00, i32 1			; CHECK-NEXT: ret <4 x double> <double 2.000000e+00, double 1.000000e+00, double undef, double undef>
	; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x double> [[I1]], double 2.000000e+00, i32 0
	; CHECK-NEXT: ret <4 x double> [[I2]]
	;			;
	entry:			entry:
	%t0 = fadd double 1.000000e+00 , 0.000000e+00			%t0 = fadd double 1.000000e+00 , 0.000000e+00
	%t1 = fadd double 1.000000e+00 , 1.000000e+00			%t1 = fadd double 1.000000e+00 , 1.000000e+00
	%t2 = fmul double %t0, 1.000000e+00			%t2 = fmul double %t0, 1.000000e+00
	%i1 = insertelement <4 x double> undef, double %t2, i32 1			%i1 = insertelement <4 x double> undef, double %t2, i32 1
	%t3 = fmul double %t1, 1.000000e+00			%t3 = fmul double %t1, 1.000000e+00
	%i2 = insertelement <4 x double> %i1, double %t3, i32 0			%i2 = insertelement <4 x double> %i1, double %t3, i32 0
	ret <4 x double> %i2			ret <4 x double> %i2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/zext-inseltpoison.ll

	Show All 10 Lines
	;			;

	define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {			define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
	; SSE2-LABEL: @loadext_2i8_to_2i64(			; SSE2-LABEL: @loadext_2i8_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i8_to_2i64(			; SLM-LABEL: @loadext_2i8_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i8_to_2i64(			; AVX-LABEL: @loadext_2i8_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%x0 = zext i8 %i0 to i64			%x0 = zext i8 %i0 to i64
	%x1 = zext i8 %i1 to i64			%x1 = zext i8 %i1 to i64
	%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0			%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {			define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
	; SSE2-LABEL: @loadext_4i8_to_4i32(			; SSE2-LABEL: @loadext_4i8_to_4i32(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i32> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i8_to_4i32(			; SLM-LABEL: @loadext_4i8_to_4i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <4 x i32> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i32> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i8_to_4i32(			; AVX-LABEL: @loadext_4i8_to_4i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i32> [[V3]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%i2 = load i8, i8* %p2, align 1			%i2 = load i8, i8* %p2, align 1
	%i3 = load i8, i8* %p3, align 1			%i3 = load i8, i8* %p3, align 1
	Show All 11 Lines
	define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {			define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
	; SSE2-LABEL: @loadext_4i8_to_4i64(			; SSE2-LABEL: @loadext_4i8_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i8_to_4i64(			; SLM-LABEL: @loadext_4i8_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i8_to_4i64(			; AVX-LABEL: @loadext_4i8_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%i2 = load i8, i8* %p2, align 1			%i2 = load i8, i8* %p2, align 1
	%i3 = load i8, i8* %p3, align 1			%i3 = load i8, i8* %p3, align 1
	Show All 15 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i16> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i16> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i8_to_8i16(			; SLM-LABEL: @loadext_8i8_to_8i16(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <8 x i16> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i16> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i8_to_8i16(			; AVX-LABEL: @loadext_8i8_to_8i16(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i16> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> poison, i16 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i16> [[V7]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	Show All 31 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i32> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i8_to_8i32(			; SLM-LABEL: @loadext_8i8_to_8i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <8 x i32> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i32
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i32
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i32
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i32> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i8_to_8i32(			; AVX-LABEL: @loadext_8i8_to_8i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i32> [[V7]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	Show All 39 Lines
	; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>			; SSE2-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0			; SSE2-NEXT: ret <16 x i16> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
	; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
	; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
	; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
	; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
	; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
	; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
	; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
	; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
	; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
	; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
	; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
	; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
	; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
	; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
	; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
	; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
	; SSE2-NEXT: ret <16 x i16> [[V15]]
	;			;
	; SLM-LABEL: @loadext_16i8_to_16i16(			; SLM-LABEL: @loadext_16i8_to_16i16(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8			; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
	; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9			; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
	; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10			; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
	; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
	; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
	; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
	; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
	; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
	; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
	; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
	; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
	; SLM-NEXT: [[X8:%.*]] = zext i8 [[I8]] to i16
	; SLM-NEXT: [[X9:%.*]] = zext i8 [[I9]] to i16
	; SLM-NEXT: [[X10:%.*]] = zext i8 [[I10]] to i16
	; SLM-NEXT: [[X11:%.*]] = zext i8 [[I11]] to i16
	; SLM-NEXT: [[X12:%.*]] = zext i8 [[I12]] to i16
	; SLM-NEXT: [[X13:%.*]] = zext i8 [[I13]] to i16
	; SLM-NEXT: [[X14:%.*]] = zext i8 [[I14]] to i16
	; SLM-NEXT: [[X15:%.*]] = zext i8 [[I15]] to i16
	; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
	; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
	; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
	; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
	; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
	; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
	; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
	; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
	; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
	; SLM-NEXT: ret <16 x i16> [[V15]]
	;			;
	; AVX-LABEL: @loadext_16i8_to_16i16(			; AVX-LABEL: @loadext_16i8_to_16i16(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8			; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
	; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9			; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
	; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10			; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
	; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>			; AVX-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0			; AVX-NEXT: ret <16 x i16> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> poison, i16 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
	; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
	; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
	; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
	; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
	; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
	; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
	; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
	; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
	; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
	; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
	; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
	; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
	; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
	; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
	; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
	; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
	; AVX-NEXT: ret <16 x i16> [[V15]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	;			;

	define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {			define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
	; SSE2-LABEL: @loadext_2i16_to_2i64(			; SSE2-LABEL: @loadext_2i16_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i16_to_2i64(			; SLM-LABEL: @loadext_2i16_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i16_to_2i64(			; AVX-LABEL: @loadext_2i16_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%x0 = zext i16 %i0 to i64			%x0 = zext i16 %i0 to i64
	%x1 = zext i16 %i1 to i64			%x1 = zext i16 %i1 to i64
	%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0			%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {			define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
	; SSE2-LABEL: @loadext_4i16_to_4i32(			; SSE2-LABEL: @loadext_4i16_to_4i32(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i32> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i16_to_4i32(			; SLM-LABEL: @loadext_4i16_to_4i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <4 x i32> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i32> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i16_to_4i32(			; AVX-LABEL: @loadext_4i16_to_4i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i32> [[V3]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%i2 = load i16, i16* %p2, align 1			%i2 = load i16, i16* %p2, align 1
	%i3 = load i16, i16* %p3, align 1			%i3 = load i16, i16* %p3, align 1
	Show All 11 Lines
	define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {			define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
	; SSE2-LABEL: @loadext_4i16_to_4i64(			; SSE2-LABEL: @loadext_4i16_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i16_to_4i64(			; SLM-LABEL: @loadext_4i16_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i16_to_4i64(			; AVX-LABEL: @loadext_4i16_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%i2 = load i16, i16* %p2, align 1			%i2 = load i16, i16* %p2, align 1
	%i3 = load i16, i16* %p3, align 1			%i3 = load i16, i16* %p3, align 1
	Show All 15 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i32> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i16_to_8i32(			; SLM-LABEL: @loadext_8i16_to_8i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <8 x i32> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
	; SLM-NEXT: [[X4:%.*]] = zext i16 [[I4]] to i32
	; SLM-NEXT: [[X5:%.*]] = zext i16 [[I5]] to i32
	; SLM-NEXT: [[X6:%.*]] = zext i16 [[I6]] to i32
	; SLM-NEXT: [[X7:%.*]] = zext i16 [[I7]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i32> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i16_to_8i32(			; AVX-LABEL: @loadext_8i16_to_8i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i32> [[V7]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%p4 = getelementptr inbounds i16, i16* %p0, i64 4			%p4 = getelementptr inbounds i16, i16* %p0, i64 4
	%p5 = getelementptr inbounds i16, i16* %p0, i64 5			%p5 = getelementptr inbounds i16, i16* %p0, i64 5
	%p6 = getelementptr inbounds i16, i16* %p0, i64 6			%p6 = getelementptr inbounds i16, i16* %p0, i64 6
	%p7 = getelementptr inbounds i16, i16* %p0, i64 7			%p7 = getelementptr inbounds i16, i16* %p0, i64 7
	Show All 29 Lines
	;			;

	define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {			define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
	; SSE2-LABEL: @loadext_2i32_to_2i64(			; SSE2-LABEL: @loadext_2i32_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i32_to_2i64(			; SLM-LABEL: @loadext_2i32_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i32_to_2i64(			; AVX-LABEL: @loadext_2i32_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i32, i32* %p0, i64 1			%p1 = getelementptr inbounds i32, i32* %p0, i64 1
	%i0 = load i32, i32* %p0, align 1			%i0 = load i32, i32* %p0, align 1
	%i1 = load i32, i32* %p1, align 1			%i1 = load i32, i32* %p1, align 1
	%x0 = zext i32 %i0 to i64			%x0 = zext i32 %i0 to i64
	%x1 = zext i32 %i1 to i64			%x1 = zext i32 %i1 to i64
	%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0			%v0 = insertelement <2 x i64> poison, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {			define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
	; SSE2-LABEL: @loadext_4i32_to_4i64(			; SSE2-LABEL: @loadext_4i32_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i32_to_4i64(			; SLM-LABEL: @loadext_4i32_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i32_to_4i64(			; AVX-LABEL: @loadext_4i32_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> poison, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i32, i32* %p0, i64 1			%p1 = getelementptr inbounds i32, i32* %p0, i64 1
	%p2 = getelementptr inbounds i32, i32* %p0, i64 2			%p2 = getelementptr inbounds i32, i32* %p0, i64 2
	%p3 = getelementptr inbounds i32, i32* %p0, i64 3			%p3 = getelementptr inbounds i32, i32* %p0, i64 3
	%i0 = load i32, i32* %p0, align 1			%i0 = load i32, i32* %p0, align 1
	%i1 = load i32, i32* %p1, align 1			%i1 = load i32, i32* %p1, align 1
	%i2 = load i32, i32* %p2, align 1			%i2 = load i32, i32* %p2, align 1
	%i3 = load i32, i32* %p3, align 1			%i3 = load i32, i32* %p3, align 1
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

	Show All 10 Lines
	;			;

	define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {			define <2 x i64> @loadext_2i8_to_2i64(i8* %p0) {
	; SSE2-LABEL: @loadext_2i8_to_2i64(			; SSE2-LABEL: @loadext_2i8_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i8_to_2i64(			; SLM-LABEL: @loadext_2i8_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i8_to_2i64(			; AVX-LABEL: @loadext_2i8_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <2 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%x0 = zext i8 %i0 to i64			%x0 = zext i8 %i0 to i64
	%x1 = zext i8 %i1 to i64			%x1 = zext i8 %i1 to i64
	%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0			%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {			define <4 x i32> @loadext_4i8_to_4i32(i8* %p0) {
	; SSE2-LABEL: @loadext_4i8_to_4i32(			; SSE2-LABEL: @loadext_4i8_to_4i32(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i32> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i8_to_4i32(			; SLM-LABEL: @loadext_4i8_to_4i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <4 x i32> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i32> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i8_to_4i32(			; AVX-LABEL: @loadext_4i8_to_4i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i32> [[V3]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%i2 = load i8, i8* %p2, align 1			%i2 = load i8, i8* %p2, align 1
	%i3 = load i8, i8* %p3, align 1			%i3 = load i8, i8* %p3, align 1
	Show All 11 Lines
	define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {			define <4 x i64> @loadext_4i8_to_4i64(i8* %p0) {
	; SSE2-LABEL: @loadext_4i8_to_4i64(			; SSE2-LABEL: @loadext_4i8_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i8_to_4i64(			; SLM-LABEL: @loadext_4i8_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i8_to_4i64(			; AVX-LABEL: @loadext_4i8_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <4 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%i0 = load i8, i8* %p0, align 1			%i0 = load i8, i8* %p0, align 1
	%i1 = load i8, i8* %p1, align 1			%i1 = load i8, i8* %p1, align 1
	%i2 = load i8, i8* %p2, align 1			%i2 = load i8, i8* %p2, align 1
	%i3 = load i8, i8* %p3, align 1			%i3 = load i8, i8* %p3, align 1
	Show All 15 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i16> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i16> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i8_to_8i16(			; SLM-LABEL: @loadext_8i8_to_8i16(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <8 x i16> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i16> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i8_to_8i16(			; AVX-LABEL: @loadext_8i8_to_8i16(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i16> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i16> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i16> undef, i16 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i16> [[V0]], i16 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i16> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i16> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i16> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i16> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i16> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i16> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i16> [[V7]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	Show All 31 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i32> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i8_to_8i32(			; SLM-LABEL: @loadext_8i8_to_8i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <8 x i32> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i32
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i32
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i32
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i32
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i32> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i8_to_8i32(			; AVX-LABEL: @loadext_8i8_to_8i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <8 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i32> [[V7]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	Show All 39 Lines
	; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; SSE2-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; SSE2-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; SSE2-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; SSE2-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; SSE2-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>			; SSE2-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0			; SSE2-NEXT: ret <16 x i16> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
	; SSE2-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
	; SSE2-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
	; SSE2-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
	; SSE2-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
	; SSE2-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
	; SSE2-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
	; SSE2-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
	; SSE2-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
	; SSE2-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
	; SSE2-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
	; SSE2-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
	; SSE2-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
	; SSE2-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
	; SSE2-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
	; SSE2-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
	; SSE2-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
	; SSE2-NEXT: ret <16 x i16> [[V15]]
	;			;
	; SLM-LABEL: @loadext_16i8_to_16i16(			; SLM-LABEL: @loadext_16i8_to_16i16(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8			; SLM-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
	; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9			; SLM-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
	; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10			; SLM-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
	; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; SLM-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; SLM-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; SLM-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; SLM-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; SLM-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; SLM-NEXT: [[I0:%.]] = load i8, i8 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; SLM-NEXT: [[I1:%.]] = load i8, i8 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i8, i8 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; SLM-NEXT: [[I3:%.]] = load i8, i8 [[P3]], align 1			; SLM-NEXT: ret <16 x i16> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i8, i8 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i8, i8 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i8, i8 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i8, i8 [[P7]], align 1
	; SLM-NEXT: [[I8:%.]] = load i8, i8 [[P8]], align 1
	; SLM-NEXT: [[I9:%.]] = load i8, i8 [[P9]], align 1
	; SLM-NEXT: [[I10:%.]] = load i8, i8 [[P10]], align 1
	; SLM-NEXT: [[I11:%.]] = load i8, i8 [[P11]], align 1
	; SLM-NEXT: [[I12:%.]] = load i8, i8 [[P12]], align 1
	; SLM-NEXT: [[I13:%.]] = load i8, i8 [[P13]], align 1
	; SLM-NEXT: [[I14:%.]] = load i8, i8 [[P14]], align 1
	; SLM-NEXT: [[I15:%.]] = load i8, i8 [[P15]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i8 [[I0]] to i16
	; SLM-NEXT: [[X1:%.*]] = zext i8 [[I1]] to i16
	; SLM-NEXT: [[X2:%.*]] = zext i8 [[I2]] to i16
	; SLM-NEXT: [[X3:%.*]] = zext i8 [[I3]] to i16
	; SLM-NEXT: [[X4:%.*]] = zext i8 [[I4]] to i16
	; SLM-NEXT: [[X5:%.*]] = zext i8 [[I5]] to i16
	; SLM-NEXT: [[X6:%.*]] = zext i8 [[I6]] to i16
	; SLM-NEXT: [[X7:%.*]] = zext i8 [[I7]] to i16
	; SLM-NEXT: [[X8:%.*]] = zext i8 [[I8]] to i16
	; SLM-NEXT: [[X9:%.*]] = zext i8 [[I9]] to i16
	; SLM-NEXT: [[X10:%.*]] = zext i8 [[I10]] to i16
	; SLM-NEXT: [[X11:%.*]] = zext i8 [[I11]] to i16
	; SLM-NEXT: [[X12:%.*]] = zext i8 [[I12]] to i16
	; SLM-NEXT: [[X13:%.*]] = zext i8 [[I13]] to i16
	; SLM-NEXT: [[X14:%.*]] = zext i8 [[I14]] to i16
	; SLM-NEXT: [[X15:%.*]] = zext i8 [[I15]] to i16
	; SLM-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[X7]], i32 7
	; SLM-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[X8]], i32 8
	; SLM-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[X9]], i32 9
	; SLM-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[X10]], i32 10
	; SLM-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[X11]], i32 11
	; SLM-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[X12]], i32 12
	; SLM-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[X13]], i32 13
	; SLM-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[X14]], i32 14
	; SLM-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[X15]], i32 15
	; SLM-NEXT: ret <16 x i16> [[V15]]
	;			;
	; AVX-LABEL: @loadext_16i8_to_16i16(			; AVX-LABEL: @loadext_16i8_to_16i16(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 7
	; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8			; AVX-NEXT: [[P8:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 8
	; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9			; AVX-NEXT: [[P9:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 9
	; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10			; AVX-NEXT: [[P10:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 10
	; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11			; AVX-NEXT: [[P11:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 11
	; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12			; AVX-NEXT: [[P12:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 12
	; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13			; AVX-NEXT: [[P13:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 13
	; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14			; AVX-NEXT: [[P14:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 14
	; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15			; AVX-NEXT: [[P15:%.]] = getelementptr inbounds i8, i8 [[P0]], i64 15
	; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i8 [[P0]] to <16 x i8>*
	; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>			; AVX-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[TMP2]] to <16 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <16 x i16> [[TMP3]], i32 0			; AVX-NEXT: ret <16 x i16> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <16 x i16> undef, i16 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <16 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <16 x i16> [[V0]], i16 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <16 x i16> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <16 x i16> [[V1]], i16 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <16 x i16> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <16 x i16> [[V2]], i16 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <16 x i16> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <16 x i16> [[V3]], i16 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <16 x i16> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <16 x i16> [[V4]], i16 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <16 x i16> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <16 x i16> [[V5]], i16 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <16 x i16> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <16 x i16> [[V6]], i16 [[TMP11]], i32 7
	; AVX-NEXT: [[TMP12:%.*]] = extractelement <16 x i16> [[TMP3]], i32 8
	; AVX-NEXT: [[V8:%.*]] = insertelement <16 x i16> [[V7]], i16 [[TMP12]], i32 8
	; AVX-NEXT: [[TMP13:%.*]] = extractelement <16 x i16> [[TMP3]], i32 9
	; AVX-NEXT: [[V9:%.*]] = insertelement <16 x i16> [[V8]], i16 [[TMP13]], i32 9
	; AVX-NEXT: [[TMP14:%.*]] = extractelement <16 x i16> [[TMP3]], i32 10
	; AVX-NEXT: [[V10:%.*]] = insertelement <16 x i16> [[V9]], i16 [[TMP14]], i32 10
	; AVX-NEXT: [[TMP15:%.*]] = extractelement <16 x i16> [[TMP3]], i32 11
	; AVX-NEXT: [[V11:%.*]] = insertelement <16 x i16> [[V10]], i16 [[TMP15]], i32 11
	; AVX-NEXT: [[TMP16:%.*]] = extractelement <16 x i16> [[TMP3]], i32 12
	; AVX-NEXT: [[V12:%.*]] = insertelement <16 x i16> [[V11]], i16 [[TMP16]], i32 12
	; AVX-NEXT: [[TMP17:%.*]] = extractelement <16 x i16> [[TMP3]], i32 13
	; AVX-NEXT: [[V13:%.*]] = insertelement <16 x i16> [[V12]], i16 [[TMP17]], i32 13
	; AVX-NEXT: [[TMP18:%.*]] = extractelement <16 x i16> [[TMP3]], i32 14
	; AVX-NEXT: [[V14:%.*]] = insertelement <16 x i16> [[V13]], i16 [[TMP18]], i32 14
	; AVX-NEXT: [[TMP19:%.*]] = extractelement <16 x i16> [[TMP3]], i32 15
	; AVX-NEXT: [[V15:%.*]] = insertelement <16 x i16> [[V14]], i16 [[TMP19]], i32 15
	; AVX-NEXT: ret <16 x i16> [[V15]]
	;			;
	%p1 = getelementptr inbounds i8, i8* %p0, i64 1			%p1 = getelementptr inbounds i8, i8* %p0, i64 1
	%p2 = getelementptr inbounds i8, i8* %p0, i64 2			%p2 = getelementptr inbounds i8, i8* %p0, i64 2
	%p3 = getelementptr inbounds i8, i8* %p0, i64 3			%p3 = getelementptr inbounds i8, i8* %p0, i64 3
	%p4 = getelementptr inbounds i8, i8* %p0, i64 4			%p4 = getelementptr inbounds i8, i8* %p0, i64 4
	%p5 = getelementptr inbounds i8, i8* %p0, i64 5			%p5 = getelementptr inbounds i8, i8* %p0, i64 5
	%p6 = getelementptr inbounds i8, i8* %p0, i64 6			%p6 = getelementptr inbounds i8, i8* %p0, i64 6
	%p7 = getelementptr inbounds i8, i8* %p0, i64 7			%p7 = getelementptr inbounds i8, i8* %p0, i64 7
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	;			;

	define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {			define <2 x i64> @loadext_2i16_to_2i64(i16* %p0) {
	; SSE2-LABEL: @loadext_2i16_to_2i64(			; SSE2-LABEL: @loadext_2i16_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i16_to_2i64(			; SLM-LABEL: @loadext_2i16_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i16_to_2i64(			; AVX-LABEL: @loadext_2i16_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <2 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i16> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%x0 = zext i16 %i0 to i64			%x0 = zext i16 %i0 to i64
	%x1 = zext i16 %i1 to i64			%x1 = zext i16 %i1 to i64
	%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0			%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {			define <4 x i32> @loadext_4i16_to_4i32(i16* %p0) {
	; SSE2-LABEL: @loadext_4i16_to_4i32(			; SSE2-LABEL: @loadext_4i16_to_4i32(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i32> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i16_to_4i32(			; SLM-LABEL: @loadext_4i16_to_4i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <4 x i32> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i32> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i16_to_4i32(			; AVX-LABEL: @loadext_4i16_to_4i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i32> [[V3]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%i2 = load i16, i16* %p2, align 1			%i2 = load i16, i16* %p2, align 1
	%i3 = load i16, i16* %p3, align 1			%i3 = load i16, i16* %p3, align 1
	Show All 11 Lines
	define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {			define <4 x i64> @loadext_4i16_to_4i64(i16* %p0) {
	; SSE2-LABEL: @loadext_4i16_to_4i64(			; SSE2-LABEL: @loadext_4i16_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i16_to_4i64(			; SLM-LABEL: @loadext_4i16_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i16_to_4i64(			; AVX-LABEL: @loadext_4i16_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <4 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i16> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%i0 = load i16, i16* %p0, align 1			%i0 = load i16, i16* %p0, align 1
	%i1 = load i16, i16* %p1, align 1			%i1 = load i16, i16* %p1, align 1
	%i2 = load i16, i16* %p2, align 1			%i2 = load i16, i16* %p2, align 1
	%i3 = load i16, i16* %p3, align 1			%i3 = load i16, i16* %p3, align 1
	Show All 15 Lines
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; SSE2-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; SSE2-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; SSE2-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; SSE2-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>			; SSE2-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; SSE2-NEXT: ret <8 x i32> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; SSE2-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; SSE2-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; SSE2-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; SSE2-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; SSE2-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; SSE2-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; SSE2-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; SSE2-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; SSE2-NEXT: ret <8 x i32> [[V7]]
	;			;
	; SLM-LABEL: @loadext_8i16_to_8i32(			; SLM-LABEL: @loadext_8i16_to_8i32(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; SLM-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; SLM-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; SLM-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; SLM-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; SLM-NEXT: [[I0:%.]] = load i16, i16 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; SLM-NEXT: [[I1:%.]] = load i16, i16 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i16, i16 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; SLM-NEXT: [[I3:%.]] = load i16, i16 [[P3]], align 1			; SLM-NEXT: ret <8 x i32> [[TMP3]]
	; SLM-NEXT: [[I4:%.]] = load i16, i16 [[P4]], align 1
	; SLM-NEXT: [[I5:%.]] = load i16, i16 [[P5]], align 1
	; SLM-NEXT: [[I6:%.]] = load i16, i16 [[P6]], align 1
	; SLM-NEXT: [[I7:%.]] = load i16, i16 [[P7]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i16 [[I0]] to i32
	; SLM-NEXT: [[X1:%.*]] = zext i16 [[I1]] to i32
	; SLM-NEXT: [[X2:%.*]] = zext i16 [[I2]] to i32
	; SLM-NEXT: [[X3:%.*]] = zext i16 [[I3]] to i32
	; SLM-NEXT: [[X4:%.*]] = zext i16 [[I4]] to i32
	; SLM-NEXT: [[X5:%.*]] = zext i16 [[I5]] to i32
	; SLM-NEXT: [[X6:%.*]] = zext i16 [[I6]] to i32
	; SLM-NEXT: [[X7:%.*]] = zext i16 [[I7]] to i32
	; SLM-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[X3]], i32 3
	; SLM-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[X4]], i32 4
	; SLM-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[X5]], i32 5
	; SLM-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[X6]], i32 6
	; SLM-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[X7]], i32 7
	; SLM-NEXT: ret <8 x i32> [[V7]]
	;			;
	; AVX-LABEL: @loadext_8i16_to_8i32(			; AVX-LABEL: @loadext_8i16_to_8i32(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i16, i16 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 5
	; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6			; AVX-NEXT: [[P6:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 6
	; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7			; AVX-NEXT: [[P7:%.]] = getelementptr inbounds i16, i16 [[P0]], i64 7
	; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>			; AVX-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP3]], i32 0			; AVX-NEXT: ret <8 x i32> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <8 x i32> [[V0]], i32 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <8 x i32> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <8 x i32> [[V1]], i32 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <8 x i32> [[V2]], i32 [[TMP7]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP3]], i32 4
	; AVX-NEXT: [[V4:%.*]] = insertelement <8 x i32> [[V3]], i32 [[TMP8]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <8 x i32> [[TMP3]], i32 5
	; AVX-NEXT: [[V5:%.*]] = insertelement <8 x i32> [[V4]], i32 [[TMP9]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP3]], i32 6
	; AVX-NEXT: [[V6:%.*]] = insertelement <8 x i32> [[V5]], i32 [[TMP10]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[TMP3]], i32 7
	; AVX-NEXT: [[V7:%.*]] = insertelement <8 x i32> [[V6]], i32 [[TMP11]], i32 7
	; AVX-NEXT: ret <8 x i32> [[V7]]
	;			;
	%p1 = getelementptr inbounds i16, i16* %p0, i64 1			%p1 = getelementptr inbounds i16, i16* %p0, i64 1
	%p2 = getelementptr inbounds i16, i16* %p0, i64 2			%p2 = getelementptr inbounds i16, i16* %p0, i64 2
	%p3 = getelementptr inbounds i16, i16* %p0, i64 3			%p3 = getelementptr inbounds i16, i16* %p0, i64 3
	%p4 = getelementptr inbounds i16, i16* %p0, i64 4			%p4 = getelementptr inbounds i16, i16* %p0, i64 4
	%p5 = getelementptr inbounds i16, i16* %p0, i64 5			%p5 = getelementptr inbounds i16, i16* %p0, i64 5
	%p6 = getelementptr inbounds i16, i16* %p0, i64 6			%p6 = getelementptr inbounds i16, i16* %p0, i64 6
	%p7 = getelementptr inbounds i16, i16* %p0, i64 7			%p7 = getelementptr inbounds i16, i16* %p0, i64 7
	Show All 29 Lines
	;			;

	define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {			define <2 x i64> @loadext_2i32_to_2i64(i32* %p0) {
	; SSE2-LABEL: @loadext_2i32_to_2i64(			; SSE2-LABEL: @loadext_2i32_to_2i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <2 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: ret <2 x i64> [[V1]]
	;			;
	; SLM-LABEL: @loadext_2i32_to_2i64(			; SLM-LABEL: @loadext_2i32_to_2i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64			; SLM-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64			; SLM-NEXT: ret <2 x i64> [[TMP3]]
	; SLM-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: ret <2 x i64> [[V1]]
	;			;
	; AVX-LABEL: @loadext_2i32_to_2i64(			; AVX-LABEL: @loadext_2i32_to_2i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <2 x i32>*
	; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <2 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <2 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <2 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: ret <2 x i64> [[V1]]
	;			;
	%p1 = getelementptr inbounds i32, i32* %p0, i64 1			%p1 = getelementptr inbounds i32, i32* %p0, i64 1
	%i0 = load i32, i32* %p0, align 1			%i0 = load i32, i32* %p0, align 1
	%i1 = load i32, i32* %p1, align 1			%i1 = load i32, i32* %p1, align 1
	%x0 = zext i32 %i0 to i64			%x0 = zext i32 %i0 to i64
	%x1 = zext i32 %i1 to i64			%x1 = zext i32 %i1 to i64
	%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0			%v0 = insertelement <2 x i64> undef, i64 %x0, i32 0
	%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1			%v1 = insertelement <2 x i64> %v0, i64 %x1, i32 1
	ret <2 x i64> %v1			ret <2 x i64> %v1
	}			}

	define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {			define <4 x i64> @loadext_4i32_to_4i64(i32* %p0) {
	; SSE2-LABEL: @loadext_4i32_to_4i64(			; SSE2-LABEL: @loadext_4i32_to_4i64(
	; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SSE2-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; SSE2-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; SSE2-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*			; SSE2-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1			; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>			; SSE2-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; SSE2-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; SSE2-NEXT: ret <4 x i64> [[TMP3]]
	; SSE2-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; SSE2-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; SSE2-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; SSE2-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; SSE2-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; SSE2-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; SSE2-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; SSE2-NEXT: ret <4 x i64> [[V3]]
	;			;
	; SLM-LABEL: @loadext_4i32_to_4i64(			; SLM-LABEL: @loadext_4i32_to_4i64(
	; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; SLM-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; SLM-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; SLM-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; SLM-NEXT: [[I0:%.]] = load i32, i32 [[P0]], align 1			; SLM-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; SLM-NEXT: [[I1:%.]] = load i32, i32 [[P1]], align 1			; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; SLM-NEXT: [[I2:%.]] = load i32, i32 [[P2]], align 1			; SLM-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; SLM-NEXT: [[I3:%.]] = load i32, i32 [[P3]], align 1			; SLM-NEXT: ret <4 x i64> [[TMP3]]
	; SLM-NEXT: [[X0:%.*]] = zext i32 [[I0]] to i64
	; SLM-NEXT: [[X1:%.*]] = zext i32 [[I1]] to i64
	; SLM-NEXT: [[X2:%.*]] = zext i32 [[I2]] to i64
	; SLM-NEXT: [[X3:%.*]] = zext i32 [[I3]] to i64
	; SLM-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
	; SLM-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
	; SLM-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
	; SLM-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
	; SLM-NEXT: ret <4 x i64> [[V3]]
	;			;
	; AVX-LABEL: @loadext_4i32_to_4i64(			; AVX-LABEL: @loadext_4i32_to_4i64(
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds i32, i32 [[P0:%.*]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds i32, i32 [[P0]], i64 3
	; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*			; AVX-NEXT: [[TMP1:%.]] = bitcast i32 [[P0]] to <4 x i32>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1			; AVX-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 1
	; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>			; AVX-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[TMP2]] to <4 x i64>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; AVX-NEXT: ret <4 x i64> [[TMP3]]
	; AVX-NEXT: [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1
	; AVX-NEXT: [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
	; AVX-NEXT: [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP6]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
	; AVX-NEXT: [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP7]], i32 3
	; AVX-NEXT: ret <4 x i64> [[V3]]
	;			;
	%p1 = getelementptr inbounds i32, i32* %p0, i64 1			%p1 = getelementptr inbounds i32, i32* %p0, i64 1
	%p2 = getelementptr inbounds i32, i32* %p0, i64 2			%p2 = getelementptr inbounds i32, i32* %p0, i64 2
	%p3 = getelementptr inbounds i32, i32* %p0, i64 3			%p3 = getelementptr inbounds i32, i32* %p0, i64 3
	%i0 = load i32, i32* %p0, align 1			%i0 = load i32, i32* %p0, align 1
	%i1 = load i32, i32* %p1, align 1			%i1 = load i32, i32* %p1, align 1
	%i2 = load i32, i32* %p2, align 1			%i2 = load i32, i32* %p2, align 1
	%i3 = load i32, i32* %p3, align 1			%i3 = load i32, i32* %p3, align 1
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/vectorizable-functions-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S %s \| FileCheck %s

	declare float @memread(float) readonly #0			declare float @memread(float) readonly #0
	declare <4 x float> @vmemread(<4 x float>)			declare <4 x float> @vmemread(<4 x float>)

	define <4 x float> @memread_4x(<4 x float>* %a) {			define <4 x float> @memread_4x(<4 x float>* %a) {
	; CHECK-LABEL: @memread_4x(			; CHECK-LABEL: @memread_4x(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vmemread(<4 x float> [[TMP0]])			; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vmemread(<4 x float> [[TMP0]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
	; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
	; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @memread(float %vecext) #0			%1 = tail call fast float @memread(float %vecext) #0
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @memread(float %vecext.1) #0			%2 = tail call fast float @memread(float %vecext.1) #0
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/vectorizable-functions.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S %s \| FileCheck %s

	declare float @memread(float) readonly #0			declare float @memread(float) readonly #0
	declare <4 x float> @vmemread(<4 x float>)			declare <4 x float> @vmemread(<4 x float>)

	define <4 x float> @memread_4x(<4 x float>* %a) {			define <4 x float> @memread_4x(<4 x float>* %a) {
	; CHECK-LABEL: @memread_4x(			; CHECK-LABEL: @memread_4x(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vmemread(<4 x float> [[TMP0]])			; CHECK-NEXT: [[TMP1:%.*]] = call fast <4 x float> @vmemread(<4 x float> [[TMP0]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[TMP1]], i32 0			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP1]], i32 2
	; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP1]], i32 3
	; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP5]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @memread(float %vecext) #0			%1 = tail call fast float @memread(float %vecext) #0
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @memread(float %vecext.1) #0			%2 = tail call fast float @memread(float %vecext.1) #0
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Add insertelement instructions to vectorizable treeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 345039

llvm/include/llvm/Transforms/Vectorize/SLPVectorizer.h

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/insertelement.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/bswap-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/bswap.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/round-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/round.ll

llvm/test/Transforms/SLPVectorizer/ARM/extract-insert-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/ARM/extract-insert.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cast.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/long_chains.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr31599-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr31599.ll

llvm/test/Transforms/SLPVectorizer/X86/pr40522.ll

llvm/test/Transforms/SLPVectorizer/X86/pr44067-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr44067.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sext.ll

llvm/test/Transforms/SLPVectorizer/X86/sign-extend-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sign-extend.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

llvm/test/Transforms/SLPVectorizer/X86/value-bug-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/value-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/zext-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/zext.ll

llvm/test/Transforms/SLPVectorizer/vectorizable-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/vectorizable-functions.ll

[SLP] Add insertelement instructions to vectorizable tree
ClosedPublic