This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/8
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
64-bit-vector.ll
-
commute.ll
-
gather-reduce.ll
-
horizontal.ll
-
loadi8.ll
-
matmul.ll
-
memory-runtime-checks.ll
-
sdiv-pow2.ll
-
slp-and-reduction.ll
-
slp-or-reduction.ll
-
slp-xor-reduction.ll
-
spillcost-di.ll
-
spillcost-order.ll
-
transpose-inseltpoison.ll
-
transpose.ll
-
tsc-s352.ll
-
widen.ll
-
AMDGPU/
-
packed-math.ll
-
NVPTX/
-
v2f16.ll
-
SystemZ/
-
pr34619.ll
-
X86/
-
PR32086.ll
-
PR39774.ll
-
addsub.ll
-
align.ll
-
arith-abs.ll
-
arith-add-ssat.ll
-
arith-add-usat.ll
-
arith-add.ll
-
arith-div.ll
-
arith-fix.ll
-
arith-mul.ll
-
arith-smax.ll
-
arith-smin.ll
-
arith-sub-ssat.ll
-
arith-sub-usat.ll
-
arith-sub.ll
-
arith-umax.ll
-
arith-umin.ll
-
bitreverse.ll
-
broadcast.ll
-
bswap.ll
-
combined-stores-chains.ll
-
consecutive-access.ll
-
continue_vectorizing.ll
-
crash_exceed_scheduling.ll
-
crash_mandeltext.ll
-
crash_smallpt.ll
-
cse.ll
-
ctlz.ll
-
ctpop.ll
-
cttz.ll
-
diamond.ll
-
diamond_broadcast.ll
-
diamond_broadcast_extra_shuffle.ll
-
different-vec-widths.ll
-
dot-product.ll
-
extract_in_tree_user.ll
-
fabs.ll
-
fcopysign.ll
-
fma.ll
-
fmaxnum.ll
-
fminnum.ll
-
fmuladd.ll
-
fptosi-inseltpoison.ll
-
fptosi.ll
-
fptoui.ll
-
fround.ll
-
funclet.ll
-
gep.ll
-
horizontal-list.ll
-
horizontal-minmax.ll
-
horizontal.ll
-
insert-after-bundle.ll
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
insert-shuffle.ll
-
insertvalue.ll
-
inst_size_bug.ll
-
intrinsic_with_scalar_param.ll
-
jumbled-load-shuffle-placement.ll
-
jumbled-load.ll
-
jumbled_store_crash.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
lookahead.ll
-
metadata.ll
-
multi_block.ll
-
phi_overalignedtype.ll
-
powof2div.ll
-
powof2mul.ll
-
pr35497.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
remark_horcost.ll
-
reorder_diamond_match.ll
-
resched.ll
-
return.ll
-
reuse-extracts-in-wider-vect.ll
-
schedule_budget.ll
-
scheduling.ll
-
shift-ashr.ll
-
shift-lshr.ll
-
shift-shl.ll
-
shrink_after_reorder.ll
-
simple-loop.ll
-
simplebb.ll
-
sitofp-inseltpoison.ll
-
sitofp.ll
-
split-load8_2-unord.ll
-
sqrt.ll
-
store-jumbled.ll
-
stores-non-ordered.ll
-
stores_vectorize.ll
-
tiny-tree.ll
-
uitofp.ll
-
vectorize-reorder-alt-shuffle.ll
-
vectorize-reordered-list.ll
-
vectorize-widest-phis.ll
-
int_sideeffect.ll

Differential D118538

[SLP] Schedule only sub-graph of vectorizable instructions
ClosedPublic

Authored by reames on Jan 29 2022, 7:50 AM.

Download Raw Diff

Details

Reviewers

ABataev
fhahn
nikic

Commits

rG48cc9287f555: Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)
rG738042711bc0: Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
rG0539a26d91a1: [SLP] Schedule only sub-graph of vectorizable instructions

Summary

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. I have to admit I'm confused by this, as in theory, the scheduling should not change this at all. I have not been able to reduce an example that differs, and I don't see such a change in the tests. Any ideas what might be going on here?

(Edit: I think I see the cause. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.)

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Diff Detail

Event Timeline

reames created this revision.Jan 29 2022, 7:50 AM

Herald added subscribers: asavonic, kerbowa, bollu and 5 others. · View Herald TranscriptJan 29 2022, 7:50 AM

reames requested review of this revision.Jan 29 2022, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2022, 7:50 AM

reames added a parent revision: D117952: [SLP] Restructure RescheduleHandling [NFC].Jan 29 2022, 7:51 AM

reames edited the summary of this revision. (Show Details)Jan 29 2022, 8:52 AM

Harbormaster completed remote builds in B146458: Diff 404270.Jan 29 2022, 9:28 AM

ABataev added inline comments.Jan 31 2022, 5:41 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Can we keep this assert here or replace it with another one? It helps in many cases with incorrect scheduling.

reames added inline comments.Jan 31 2022, 11:23 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Not easily. We'd need to track the increments through the calls to calculateDependencies since the set size now depends on the transitive use walk. I get why you want this, but I don't see an easy way to preserve it. Do you think it's worth the complexity of plumbing an assert only param through calculateDependencies?

ABataev added inline comments.Jan 31 2022, 12:19 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	No, sure not. But can you try to implement something simple here?

reames added inline comments.Jan 31 2022, 12:25 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Did you have something particular in mind? Not trying to be difficult, I just don't see a simple assert here.

Rebase and remove dependency on NFC changes. This should be able to land either before or after those.

Harbormaster completed remote builds in B147436: Diff 405696.Feb 3 2022, 11:48 AM

reames added inline comments.Feb 4 2022, 1:20 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Thinking about this, I could at least add an assert that all of the bundled instructions were scheduled. That wouldn't handle transitive users of the vector tree, but it would be better than nothing. Would that satisfy you?

ping

ABataev added inline comments.Feb 11 2022, 9:55 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	This code is very sensitive and as you already noted it might be very useful to keep this (or similar) assert. Could you add an assert for bundled instructions and transitive users to be absolutely sure that neither this patch, nor future ones, break anything in scheduling?

reames mentioned this in rG2e507607754c: [SLP] Add assert that entities are scheduled as expected.Feb 15 2022, 12:34 PM

Rebase over 2e507607 which includes the requested scheduling assert.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Landed in 2e507607, patch rebased. This turned out much simpler than I'd pictured, and is clearly warranted. Thank you for pushing on this.

ABataev added inline comments.Feb 15 2022, 1:07 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7905	Will try to test it tomorrow.

Harbormaster completed remote builds in B149800: Diff 409015.Feb 15 2022, 3:18 PM

ping

In D118538#3332370, @reames wrote:

ping

I asked in rG2e507607754c25fae82c35d93d2ab53395be6ff8, duplicating here. Can you enable those extra checks for asserts build with EXPENSIVE_CHECKS?

In D118538#3332371, @ABataev wrote:

In D118538#3332370, @reames wrote:

ping

I asked in rG2e507607754c25fae82c35d93d2ab53395be6ff8, duplicating here. Can you enable those extra checks for asserts build with EXPENSIVE_CHECKS?

I just replied there.

This revision is now accepted and ready to land.Feb 22 2022, 7:41 AM

This revision was landed with ongoing or failed builds.Feb 22 2022, 10:16 AM

Closed by commit rG0539a26d91a1: [SLP] Schedule only sub-graph of vectorizable instructions (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG0539a26d91a1: [SLP] Schedule only sub-graph of vectorizable instructions.

Hi, we've bisected some win32 miscompiles to this patch: https://crbug.com/1300322.
It seems like some inalloca allocas are getting moved around relative to their corresponding llvm.stacksave/stackrestore.
Not being familiar with SLPVectorizer, I'm not sure if this patch caused it or exposed some existing issue. Any advice?

rnk added a subscriber: rnk.Mar 1 2022, 2:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 2:08 PM

Based on some debug logs, it seems like the llvm.stacksave()/llvm.stackrestore()s are moving around, not the allocas.

Better reduced repro:

$ cat /tmp/a.ll
target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32-a:0:32-S32"
target triple = "i386-pc-windows-msvc19.16.0"

declare i8* @llvm.stacksave()

declare void @llvm.stackrestore(i8*)

declare i8* @wibble(i8*)

declare void @quux(i32* inalloca(i32))

define void @ham() #1 {
  %tmp2 = alloca i8
  %tmp3 = alloca i8
  %tmp4 = alloca i8
  %tmp5 = alloca i8
  %tmp12 = alloca [12 x i8*]
  %tmp15 = call i8* @wibble(i8* %tmp2)
  %tmp16 = call i8* @wibble(i8* %tmp3)
  %tmp17 = call i8* @wibble(i8* %tmp4)
  %tmp23 = call i8* @llvm.stacksave()
  %tmp24 = alloca inalloca i32
  call void @quux(i32* inalloca(i32) %tmp24)
  call void @llvm.stackrestore(i8* %tmp23)
  %tmp32 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 0
  store i8* %tmp4, i8** %tmp32
  %tmp33 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 1
  store i8* %tmp4, i8** %tmp33
  %tmp34 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 2
  store i8* %tmp4, i8** %tmp34
  %tmp35 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 3
  store i8* %tmp4, i8** %tmp35
  %tmp36 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 4
  store i8* %tmp4, i8** %tmp36
  %tmp37 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 5
  store i8* %tmp5, i8** %tmp37
  %tmp38 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 6
  store i8* %tmp5, i8** %tmp38
  %tmp39 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 7
  store i8* %tmp5, i8** %tmp39
  ret void
}

attributes #0 = { nofree nosync nounwind willreturn }
attributes #1 = { "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+sse3,+x87" }

$ bin/opt -passes=slp-vectorizer /tmp/a.ll -S
target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32-a:0:32-S32"
target triple = "i386-pc-windows-msvc19.16.0"

; Function Attrs: nofree nosync nounwind willreturn
declare i8* @llvm.stacksave() #0

; Function Attrs: nofree nosync nounwind willreturn
declare void @llvm.stackrestore(i8*) #0

declare i8* @wibble(i8*)

declare void @quux(i32* inalloca(i32))

define void @ham() #1 {
  %tmp2 = alloca i8, align 1
  %tmp3 = alloca i8, align 1
  %tmp12 = alloca [12 x i8*], align 4
  %tmp15 = call i8* @wibble(i8* %tmp2)
  %tmp16 = call i8* @wibble(i8* %tmp3)
  %tmp24 = alloca inalloca i32, align 4
  %tmp32 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 0
  %tmp33 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 1
  %tmp34 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 2
  %tmp35 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 3
  %1 = bitcast i8** %tmp32 to <4 x i8*>*
  %tmp36 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 4
  %tmp37 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 5
  %tmp38 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 6
  %tmp39 = getelementptr inbounds [12 x i8*], [12 x i8*]* %tmp12, i32 0, i32 7
  %tmp4 = alloca i8, align 1
  %tmp5 = alloca i8, align 1
  %tmp17 = call i8* @wibble(i8* %tmp4)
  %tmp23 = call i8* @llvm.stacksave()
  call void @quux(i32* inalloca(i32) %tmp24)
  call void @llvm.stackrestore(i8* %tmp23)
  %2 = insertelement <4 x i8*> poison, i8* %tmp4, i32 0
  %shuffle = shufflevector <4 x i8*> %2, <4 x i8*> poison, <4 x i32> zeroinitializer
  store <4 x i8*> %shuffle, <4 x i8*>* %1, align 4
  %3 = insertelement <4 x i8*> %2, i8* %tmp5, i32 1
  %shuffle1 = shufflevector <4 x i8*> %3, <4 x i8*> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
  %4 = bitcast i8** %tmp36 to <4 x i8*>*
  store <4 x i8*> %shuffle1, <4 x i8*>* %4, align 4
  ret void
}

attributes #0 = { nofree nosync nounwind willreturn }
attributes #1 = { "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+sse3,+x87" }

%tmp24 is no longer between the stacksave/stackrestore

Ideally we'd revert this patch, but there are a lot of patches on top of this and the revert isn't clean.

In D118538#3353147, @aeubanks wrote:

Ideally we'd revert this patch, but there are a lot of patches on top of this and the revert isn't clean.

I can try to do it tomorrow, if Philip won't revert it himself.

Actually this sequence seems to pass check-llvm with expensive checks on. Is this reasonable?
(only reverting the top two causes the test added in ee48b2f1c3f646a32557d21b7cf476466a8b8a2f to fail)

commit c0918b5685df34a1df56eaec9408b0053adce145 (HEAD)
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:23:04 2022 -0800

    Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
    
    This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

commit 97c3a3aea222b6cdc5972ebfa411a0f43ff45e7d
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:22:59 2022 -0800

    Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
    
    This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5.

commit ee48b2f1c3f646a32557d21b7cf476466a8b8a2f
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:21:43 2022 -0800

    Revert "[SLP][NFC]Add a test for bottom to top reordering."
    
    This reverts commit b3f4535a039918965adb21509700739afc25f9f1.

commit b04a828ae3c19f5aa699f5e9821b04cb413bca9d
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:21:38 2022 -0800

    Revert "[SLP]Improve bottom-to-top reordering."
    
    This reverts commit e4b9640867150723b33f81c6479682fc955b55aa.

In D118538#3353157, @aeubanks wrote:

Actually this sequence seems to pass check-llvm with expensive checks on. Is this reasonable?
(only reverting the top two causes the test added in ee48b2f1c3f646a32557d21b7cf476466a8b8a2f to fail)

commit c0918b5685df34a1df56eaec9408b0053adce145 (HEAD)
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:23:04 2022 -0800

    Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
    
    This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

commit 97c3a3aea222b6cdc5972ebfa411a0f43ff45e7d
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:22:59 2022 -0800

    Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
    
    This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5.

commit ee48b2f1c3f646a32557d21b7cf476466a8b8a2f
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:21:43 2022 -0800

    Revert "[SLP][NFC]Add a test for bottom to top reordering."
    
    This reverts commit b3f4535a039918965adb21509700739afc25f9f1.

commit b04a828ae3c19f5aa699f5e9821b04cb413bca9d
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:21:38 2022 -0800

    Revert "[SLP]Improve bottom-to-top reordering."
    
    This reverts commit e4b9640867150723b33f81c6479682fc955b55aa.

Can you try to revert only this:

commit c0918b5685df34a1df56eaec9408b0053adce145 (HEAD)
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:23:04 2022 -0800

    Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
    
    This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

commit 97c3a3aea222b6cdc5972ebfa411a0f43ff45e7d
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Tue Mar 1 16:22:59 2022 -0800

    Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
    
    This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5.

Can you try to revert only this:

That causes the test introduced in b3f4535a03991896 to fail (llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll)

In D118538#3353249, @aeubanks wrote:

Can you try to revert only this:

That causes the test introduced in b3f4535a03991896 to fail (llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll)

Could you just update test checks for this test?

In D118538#3353253, @ABataev wrote:

In D118538#3353249, @aeubanks wrote:

Can you try to revert only this:

That causes the test introduced in b3f4535a03991896 to fail (llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll)

Could you just update test checks for this test?

Ah yes will do.

aeubanks mentioned this in rG6987ac79033b: Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]".Mar 1 2022, 5:32 PM

aeubanks added a reverting change: rG9c6250ee41df: Revert "[SLP] Schedule only sub-graph of vectorizable instructions".

In D118538#3353256, @aeubanks wrote:

In D118538#3353253, @ABataev wrote:

In D118538#3353249, @aeubanks wrote:

Can you try to revert only this:

That causes the test introduced in b3f4535a03991896 to fail (llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll)

Could you just update test checks for this test?

Ah yes will do.

Thanks!

uabelho added a subscriber: uabelho.Mar 1 2022, 11:47 PM

Thank you for the revert. I agree that the test case above shows a correctness problem in SLP. I don't yet see how that's related to this change, but will investigate and see what falls out.

Initial investigation - SLP is allowing the formation of a bundle containing allocas. This is highly suspect as there is a dependency edge between an alloca and any following stacksave which doesn't exist in the code. There's also a question of whether static allocas should be reschedule such that they're no longer static.

I can definitely see why this interacts badly with the reverted patch. I'm currently suspicious the existing code is also wrong, working on trying to find a test case now.

Further update - the old code is "correct" for definitions of correct which require multiple accidents to keep correct behavior. More than that, the code relies on forming an illegal bundle (e.g. one which has missing dependencies) in order to get hit the shufflevector code path. Have I mentioned recently that this code is less than ideal? Working on a fix now.

reames mentioned this in rG29028e47bd9b: [slp] Add tests for cause of D118538 revert.Mar 2 2022, 9:45 AM

reames added a commit: rG738042711bc0: Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"".Mar 2 2022, 10:47 AM

Root issue fixed (689bab) and patch reapplied. Please confirm the unreduced example is also fixed; it's always possible there were two issues and we only reduced/fixed one.

Hello,

We see other problems that started appearing with this commit.
With

build-all-builtins/bin/clang -finline-hint-functions -fstack-protector-all -fwrapv -std=c99 -fsanitize=undefined -O3 'vla_sum_4.c' -fsanitize=undefined -l gcc_s -o 'vla_sum_4.out'
./vla_sum_4.out

we see

*** stack smashing detected ***: ./vla_sum_4.out terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fc20889c697]
/lib64/libc.so.6(+0x118652)[0x7fc20889c652]
./vla_sum_4.out[0x42b618]
./vla_sum_4.out[0x42b92d]
./vla_sum_4.out[0x42bb45]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc2087a6555]
./vla_sum_4.out[0x402d25]
======= Memory map: ========
00400000-0043f000 r-xp 00000000 fd:01 51014653                           /repo/uabelho/master-github/llvm/vla_sum_4.out
0063f000-00640000 r--p 0003f000 fd:01 51014653                           /repo/uabelho/master-github/llvm/vla_sum_4.out
00640000-00643000 rw-p 00040000 fd:01 51014653                           /repo/uabelho/master-github/llvm/vla_sum_4.out
00643000-00f84000 rw-p 00000000 00:00 0 
023ce000-023ef000 rw-p 00000000 00:00 0                                  [heap]
7fc208784000-7fc208948000 r-xp 00000000 fd:00 33598593                   /usr/lib64/libc-2.17.so
7fc208948000-7fc208b47000 ---p 001c4000 fd:00 33598593                   /usr/lib64/libc-2.17.so
7fc208b47000-7fc208b4b000 r--p 001c3000 fd:00 33598593                   /usr/lib64/libc-2.17.so
7fc208b4b000-7fc208b4d000 rw-p 001c7000 fd:00 33598593                   /usr/lib64/libc-2.17.so
7fc208b4d000-7fc208b52000 rw-p 00000000 00:00 0 
7fc208b52000-7fc208b54000 r-xp 00000000 fd:00 34860521                   /usr/lib64/libdl-2.17.so
7fc208b54000-7fc208d54000 ---p 00002000 fd:00 34860521                   /usr/lib64/libdl-2.17.so
7fc208d54000-7fc208d55000 r--p 00002000 fd:00 34860521                   /usr/lib64/libdl-2.17.so
7fc208d55000-7fc208d56000 rw-p 00003000 fd:00 34860521                   /usr/lib64/libdl-2.17.so
7fc208d56000-7fc208e57000 r-xp 00000000 fd:00 34860522                   /usr/lib64/libm-2.17.so
7fc208e57000-7fc209056000 ---p 00101000 fd:00 34860522                   /usr/lib64/libm-2.17.so
7fc209056000-7fc209057000 r--p 00100000 fd:00 34860522                   /usr/lib64/libm-2.17.so
7fc209057000-7fc209058000 rw-p 00101000 fd:00 34860522                   /usr/lib64/libm-2.17.so
7fc209058000-7fc20905f000 r-xp 00000000 fd:00 34860529                   /usr/lib64/librt-2.17.so
7fc20905f000-7fc20925e000 ---p 00007000 fd:00 34860529                   /usr/lib64/librt-2.17.so
7fc20925e000-7fc20925f000 r--p 00006000 fd:00 34860529                   /usr/lib64/librt-2.17.so
7fc20925f000-7fc209260000 rw-p 00007000 fd:00 34860529                   /usr/lib64/librt-2.17.so
7fc209260000-7fc209277000 r-xp 00000000 fd:00 33555219                   /usr/lib64/libpthread-2.17.so
7fc209277000-7fc209476000 ---p 00017000 fd:00 33555219                   /usr/lib64/libpthread-2.17.so
7fc209476000-7fc209477000 r--p 00016000 fd:00 33555219                   /usr/lib64/libpthread-2.17.so
7fc209477000-7fc209478000 rw-p 00017000 fd:00 33555219                   /usr/lib64/libpthread-2.17.so
7fc209478000-7fc20947c000 rw-p 00000000 00:00 0 
7fc20947c000-7fc209491000 r-xp 00000000 fd:00 34815442                   /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc209491000-7fc209690000 ---p 00015000 fd:00 34815442                   /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc209690000-7fc209691000 r--p 00014000 fd:00 34815442                   /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc209691000-7fc209692000 rw-p 00015000 fd:00 34815442                   /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fc209692000-7fc2096b4000 r-xp 00000000 fd:00 34860516                   /usr/lib64/ld-2.17.so
7fc20987d000-7fc209882000 rw-p 00000000 00:00 0 
7fc2098a2000-7fc2098b3000 rw-p 00000000 00:00 0 
7fc2098b3000-7fc2098b4000 r--p 00021000 fd:00 34860516                   /usr/lib64/ld-2.17.so
7fc2098b4000-7fc2098b5000 rw-p 00022000 fd:00 34860516                   /usr/lib64/ld-2.17.so
7fc2098b5000-7fc2098b6000 rw-p 00000000 00:00 0 
7ffd12f86000-7ffd12fa8000 rw-p 00000000 00:00 0                          [stack]
7ffd12fcc000-7ffd12fce000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Abort

vla_sum_4.c1 KBDownload

Edit: This happens both with the original commit as well as the reapplied one.

Another somewhat similar case but with less different compiler flags needed:

build-all-builtins/bin/clang -finline-hint-functions -std=c99 -fsanitize=undefined -O2 'vla_sum_1.c' -o 'vla_sum_1.out'
./vla_sum_1.out

Result:

UndefinedBehaviorSanitizer:DEADLYSIGNAL
==160192==ERROR: UndefinedBehaviorSanitizer: BUS on unknown address (pc 0x00000042b6c9 bp 0x7fff7d630950 sp 0x1000100010001 T160192)
==160192==The signal is caused by a READ memory access.
==160192==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x42b6c9  (/repo/uabelho/master-github/llvm/vla_sum_1.out+0x42b6c9)
    #1 0x42b857  (/repo/uabelho/master-github/llvm/vla_sum_1.out+0x42b857)
    #2 0x7fa2c69b4554  (/lib64/libc.so.6+0x22554) (BuildId: e6847a931dd483773bab779dd3985b17c11caab2)
    #3 0x402cb4  (/repo/uabelho/master-github/llvm/vla_sum_1.out+0x402cb4)

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: BUS (/repo/uabelho/master-github/llvm/vla_sum_1.out+0x42b6c9) 
==160192==ABORTING

vla_sum_1.c992 BDownload

In D118538#3356307, @uabelho wrote:

We see other problems that started appearing with this commit.

If you have known problems, please file a bug. Waiting until I fix one and then reporting something new in the review after a reland is less than helpful.

I'm about to revert again. I don't have time to investigate this immediately, so it'll probably be a day or two before I report back on the cause of this one.

Sounds like you're already planning on re-reverting, but the unreduced case on our end started failing again at the reland as well: https://bugs.chromium.org/p/chromium/issues/detail?id=1300322#c15

reames added a reverting change: rGdeae979a2cfb: Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""".Mar 3 2022, 11:35 AM

Running slp-vectorizer on the attached file, you'll see that the same issue with stacksave/restore and inalloca allocas happens around the second call to IsDeprecatedMap.

b.ll181 KBDownload

In D118538#3357833, @reames wrote:

In D118538#3356307, @uabelho wrote:

We see other problems that started appearing with this commit.

If you have known problems, please file a bug. Waiting until I fix one and then reporting something new in the review after a reland is less than helpful.

I'm about to revert again. I don't have time to investigate this immediately, so it'll probably be a day or two before I report back on the cause of this one.

I wrote
https://github.com/llvm/llvm-project/issues/54197
and
https://github.com/llvm/llvm-project/issues/54198
about the two problems I've seen.

I did take a look at this, and figured out at least one of the problems. The scheduling code does not account for implicit control dependencies. (e.g. consider a case where we reorder two maythrow functions) The alloca symptom is one example of this, but there are also a bunch of others. The original code ends up being (mostly?) correct by relying on the scheduler not to change the order of two instructions which are both "ready". I have this bad feeling that the original code is wrong in some cornercase, but don't have a reproducer. To reland this patch, I will need to implement implicit control flow dependencies.

There may be another problem here as well. I discovered the above via code inspection, and I can't quite map how that bug causes the sanitizer failures. However, I haven't looked into that as a) the reproduction is annoying complicated and b) I've been focused on memoryssa in the meantime.

reames mentioned this in rG6253b77da9f3: [SLP] Respect control dependence within a block during scheduling.Mar 19 2022, 1:36 PM

reames mentioned this in rGb7806c8b3764: [SLP] Explicit track required stacksave/alloca dependency.Mar 20 2022, 1:58 PM

Ok, at this point, I've identified two missing dependency bugs in the original code, and fixed both. I believe these two to explain the regressions reported against this change in their entirety.

I have run a stage2 build with scheduling priority reversed (so as to help expose any further missing dependencies) and expensive checks (in this file only) enabled, both with and without this change applied. I do not see any indication of further missing dependencies, but scheduling defects which aren't caught by the verifier are only moderately likely to be caught by this approach.

I'm going to wait a couple days to make sure the dependency changes stick without issue, and then reapply with this change. In the meantime, if anyone wants to retest on top-of-tree with this change applied, I'd really appreciate positive confirmation that I got everything lurking here.

In D118538#3359672, @uabelho wrote:

I wrote
https://github.com/llvm/llvm-project/issues/54197
and
https://github.com/llvm/llvm-project/issues/54198
about the two problems I've seen.

I've verified that the above two problems I saw both disappeared with 6253b77da9:

[SLP] Respect control dependence within a block during scheduling

In D118538#3395837, @uabelho wrote:
In D118538#3359672, @uabelho wrote:

I wrote
https://github.com/llvm/llvm-project/issues/54197
and
https://github.com/llvm/llvm-project/issues/54198
about the two problems I've seen.

I've verified that the above two problems I saw both disappeared with 6253b77da9:
[SLP] Respect control dependence within a block during scheduling

Thank you! Positive confirmation here very much appreciated.

Hi,

Heads up that I've seen the following assert (added in 6253b77da) trigger in one of our downstream tests:

opt: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:8066: void llvm::slpvectorizer::BoUpSLP::BlockScheduling::calculateDependencies(llvm::slpvectorizer::BoUpSLP::ScheduleData *, bool, llvm::slpvectorizer::BoUpSLP *): Assertion `DepDest && "must be in schedule window"' failed.

I'll try to extract a reproducer and come back.

In D118538#3398720, @uabelho wrote:
Hi,

Heads up that I've seen the following assert (added in 6253b77da) trigger in one of our downstream tests:
opt: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:8066: void llvm::slpvectorizer::BoUpSLP::BlockScheduling::calculateDependencies(llvm::slpvectorizer::BoUpSLP::ScheduleData *, bool, llvm::slpvectorizer::BoUpSLP *): Assertion `DepDest && "must be in schedule window"' failed.
I'll try to extract a reproducer and come back.

Ok, at commit 31486a9fc27 it crashed with

opt -passes='function<eager-inv>(slp-vectorizer)' -o /dev/null slp.ll

However, I noticed that it doesn't crash on trunk anymore after commit 79a182371e

[SLP]Make stricter check for instructions that do not require
scheduling.

Need to check that the instructions with external operands can be
reordered safely before actualy exclude them from the scheduling.

So perhaps/hopefully the problem is already solved.

slp.ll494 BDownload

Edit: Ok, now I realize this has been reported already in https://github.com/llvm/llvm-project/issues/54465

I've run some more testing with this patch reapplied locally without seeing problems so there at least doesn't seem to be anything that pops up at once for us.
(Always hard to know how much testing is "enough" when running fuzz testing though, but I've run tests for two nights without seeing anything else than the problem above that has already been fixed.)

@uabelho - Thank you! I really appreciate the amount of work you've put towards confirming that no further problems are lurking. You've gone above and beyond here, thanks!

Since we seem to have reasonable confidence that the underlying issues have been fixed, and that the fixes for those issues are stable in tree, I'm going to go ahead and push the original change again this morning. I will monitor for a couple hours in case of immediate fallout, but will be out of office for a good chunk of the day. If anyone spots fallout that looks severe, please feel free to revert without me.

Well, it seems I got slightly ahead of myself. When reviewing the diff before push, I spotted another questionable looking alloca/stacksave reordering. We appear to have one more missing dependency.

define void @stacksave2(i8 %a, i8 %b, i8 %c) {
; CHECK-LABEL: @stacksave2(
-; CHECK-NEXT: [[V1:%.*]] = alloca i8, align 1
; CHECK-NEXT: [[STACK:%.*]] = call i8* @llvm.stacksave()
+; CHECK-NEXT: [[B2:%.*]] = getelementptr i8*, i8 [[B:%.*]], i32 1
+; CHECK-NEXT: [[V1:%.*]] = alloca i8, align 1

Here we are sinking an alloca from above a stacksave into the region of the save/restore. This looks pretty suspect as there could be a use of v1 after the stackrestore and thus the transform could introduce an out of bounds access on the stack. The only way I can see this being correct was if the original program would have to be undefined, but I don't currently why that would be the case.

Here's where I'm very happy I spent time to write extensive tests even though at the time I didn't think we needed this dependence.

reames mentioned this in rGa16308c2823b: [SLP] Explicit track required stacksave/alloca dependency (try 3).Mar 25 2022, 10:04 AM

reames added a commit: rG48cc9287f555: Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3).Mar 25 2022, 10:44 AM

I've gone ahead and fixed the last missing dependence case I found, and landed the rebased patch. At this point, I'm not expecting further fallout, but if anyone sees anything, please feel free to immediately revert as long as you've filed a bug with a reproducer. I would not be shocked to find we do have another missing scheduling dependence somewhere.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

25 lines

test/

Transforms/

SLPVectorizer/

AArch64/

4 lines

4 lines

8 lines

6 lines

2 lines

68 lines

memory-runtime-checks.ll

12 lines

4 lines

2 lines

2 lines

2 lines

10 lines

4 lines

transpose-inseltpoison.ll

24 lines

transpose.ll

24 lines

tsc-s352.ll

4 lines

widen.ll

22 lines

AMDGPU/

packed-math.ll

4 lines

NVPTX/

v2f16.ll

2 lines

SystemZ/

pr34619.ll

2 lines

X86/

18 lines

6 lines

26 lines

6 lines

192 lines

216 lines

132 lines

272 lines

192 lines

568 lines

328 lines

244 lines

244 lines

216 lines

132 lines

272 lines

244 lines

244 lines

32 lines

18 lines

16 lines

combined-stores-chains.ll

24 lines

consecutive-access.ll

10 lines

continue_vectorizing.ll

4 lines

crash_exceed_scheduling.ll

18 lines

2 lines

14 lines

28 lines

48 lines

32 lines

48 lines

6 lines

2 lines

diamond_broadcast_extra_shuffle.ll

6 lines

different-vec-widths.ll

36 lines

dot-product.ll

40 lines

extract_in_tree_user.ll

12 lines

72 lines

104 lines

32 lines

104 lines

104 lines

128 lines

fptosi-inseltpoison.ll

32 lines

32 lines

32 lines

440 lines

4 lines

10 lines

8 lines

6 lines

4 lines

insert-after-bundle.ll

40 lines

insert-element-build-vector-inseltpoison.ll

36 lines

insert-element-build-vector.ll

36 lines

insert-shuffle.ll

10 lines

insertvalue.ll

16 lines

inst_size_bug.ll

2 lines

intrinsic_with_scalar_param.ll

6 lines

jumbled-load-shuffle-placement.ll

16 lines

jumbled-load.ll

20 lines

jumbled_store_crash.ll

48 lines

load-merge-inseltpoison.ll

18 lines

18 lines

24 lines

2 lines

8 lines

phi_overalignedtype.ll

6 lines

powof2div.ll

8 lines

powof2mul.ll

34 lines

pr35497.ll

46 lines

pr47629-inseltpoison.ll

438 lines

pr47629.ll

438 lines

remark_horcost.ll

16 lines

reorder_diamond_match.ll

54 lines

resched.ll

18 lines

return.ll

2 lines

reuse-extracts-in-wider-vect.ll

4 lines

4 lines

16 lines

180 lines

208 lines

180 lines

shrink_after_reorder.ll

8 lines

simple-loop.ll

14 lines

simplebb.ll

10 lines

sitofp-inseltpoison.ll

232 lines

sitofp.ll

232 lines

split-load8_2-unord.ll

36 lines

sqrt.ll

72 lines

store-jumbled.ll

10 lines

stores-non-ordered.ll

16 lines

stores_vectorize.ll

16 lines

tiny-tree.ll

10 lines

uitofp.ll

260 lines

vectorize-reorder-alt-shuffle.ll

8 lines

vectorize-reordered-list.ll

4 lines

vectorize-widest-phis.ll

20 lines

int_sideeffect.ll

4 lines

Diff 405696

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,692 Lines • ▼ Show 20 Lines	void schedule(ScheduleData *SD, ReadyListType &ReadyList) {
// If BundleMember is a stand-alone instruction, no operand reordering		// If BundleMember is a stand-alone instruction, no operand reordering
// has taken place, so we directly access its operands.		// has taken place, so we directly access its operands.
for (Use &U : BundleMember->Inst->operands())		for (Use &U : BundleMember->Inst->operands())
if (auto *I = dyn_cast<Instruction>(U.get()))		if (auto *I = dyn_cast<Instruction>(U.get()))
DecrUnsched(I);		DecrUnsched(I);
}		}
// Handle the memory dependencies.		// Handle the memory dependencies.
for (ScheduleData *MemoryDepSD : BundleMember->MemoryDependencies) {		for (ScheduleData *MemoryDepSD : BundleMember->MemoryDependencies) {
if (MemoryDepSD->incrementUnscheduledDeps(-1) == 0) {		if (MemoryDepSD->hasValidDependencies() &&
		MemoryDepSD->incrementUnscheduledDeps(-1) == 0) {
// There are no more unscheduled dependencies after decrementing,		// There are no more unscheduled dependencies after decrementing,
// so we can put the dependent instruction into the ready list.		// so we can put the dependent instruction into the ready list.
ScheduleData *DepBundle = MemoryDepSD->FirstInBundle;		ScheduleData *DepBundle = MemoryDepSD->FirstInBundle;
assert(!DepBundle->IsScheduled &&		assert(!DepBundle->IsScheduled &&
"already scheduled bundle gets ready");		"already scheduled bundle gets ready");
ReadyList.insert(DepBundle);		ReadyList.insert(DepBundle);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: gets ready (mem): " << *DepBundle << "\n");		<< "SLP: gets ready (mem): " << *DepBundle << "\n");
Show All 13 Lines	void doForAllOpcodes(Value *V,
Action(P.second);		Action(P.second);
}		}

/// Put all instructions into the ReadyList which are ready for scheduling.		/// Put all instructions into the ReadyList which are ready for scheduling.
template <typename ReadyListType>		template <typename ReadyListType>
void initialFillReadyList(ReadyListType &ReadyList) {		void initialFillReadyList(ReadyListType &ReadyList) {
for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {		for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {
doForAllOpcodes(I, [&](ScheduleData *SD) {		doForAllOpcodes(I, [&](ScheduleData *SD) {
if (SD->isSchedulingEntity() && SD->isReady()) {		if (SD->isSchedulingEntity() && SD->hasValidDependencies() &&
		SD->isReady()) {
ReadyList.insert(SD);		ReadyList.insert(SD);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: initially in ready list: " << *I << "\n");		<< "SLP: initially in ready list: " << *I << "\n");
}		}
});		});
}		}
}		}

▲ Show 20 Lines • Show All 5,106 Lines • ▼ Show 20 Lines
}		}

void BoUpSLP::scheduleBlock(BlockScheduling *BS) {		void BoUpSLP::scheduleBlock(BlockScheduling *BS) {
if (!BS->ScheduleStart)		if (!BS->ScheduleStart)
return;		return;

LLVM_DEBUG(dbgs() << "SLP: schedule block " << BS->BB->getName() << "\n");		LLVM_DEBUG(dbgs() << "SLP: schedule block " << BS->BB->getName() << "\n");

		// A key point - if we got here, pre-scheduling was able to find a valid
		// scheduling of the sub-graph of the scheduling window which consists
		// of all vector bundles and their transitive users. As such, we do not
		// need to reschedule anything outside of that subgraph.

BS->resetSchedule();		BS->resetSchedule();

// For the real scheduling we use a more sophisticated ready-list: it is		// For the real scheduling we use a more sophisticated ready-list: it is
// sorted by the original instruction location. This lets the final schedule		// sorted by the original instruction location. This lets the final schedule
// be as close as possible to the original instruction order.		// be as close as possible to the original instruction order.
struct ScheduleDataCompare {		struct ScheduleDataCompare {
bool operator()(ScheduleData SD1, ScheduleData SD2) const {		bool operator()(ScheduleData SD1, ScheduleData SD2) const {
return SD2->SchedulingPriority < SD1->SchedulingPriority;		return SD2->SchedulingPriority < SD1->SchedulingPriority;
}		}
};		};
std::set<ScheduleData *, ScheduleDataCompare> ReadyInsts;		std::set<ScheduleData *, ScheduleDataCompare> ReadyInsts;

// Ensure that all dependency data is updated and fill the ready-list with		// Ensure that all dependency data is updated (for nodes in the sub-graph)
// initial instructions.		// and fill the ready-list with initial instructions.
int Idx = 0;		int Idx = 0;
int NumToSchedule = 0;
for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;		for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;
I = I->getNextNode()) {		I = I->getNextNode()) {
BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {		BS->doForAllOpcodes(I, [this, &Idx, BS](ScheduleData *SD) {
assert((isVectorLikeInstWithConstOps(SD->Inst) \|\|		assert((isVectorLikeInstWithConstOps(SD->Inst) \|\|
SD->isPartOfBundle() == (getTreeEntry(SD->Inst) != nullptr)) &&		SD->isPartOfBundle() == (getTreeEntry(SD->Inst) != nullptr)) &&
"scheduler and vectorizer bundle mismatch");		"scheduler and vectorizer bundle mismatch");
SD->FirstInBundle->SchedulingPriority = Idx++;		SD->FirstInBundle->SchedulingPriority = Idx++;
if (SD->isSchedulingEntity()) {
		if (SD->isSchedulingEntity() && SD->isPartOfBundle())
BS->calculateDependencies(SD, false, this);		BS->calculateDependencies(SD, false, this);
NumToSchedule++;
}
});		});
}		}
BS->initialFillReadyList(ReadyInsts);		BS->initialFillReadyList(ReadyInsts);

Instruction *LastScheduledInst = BS->ScheduleEnd;		Instruction *LastScheduledInst = BS->ScheduleEnd;

// Do the "real" scheduling.		// Do the "real" scheduling.
while (!ReadyInsts.empty()) {		while (!ReadyInsts.empty()) {
ScheduleData picked = ReadyInsts.begin();		ScheduleData picked = ReadyInsts.begin();
ReadyInsts.erase(ReadyInsts.begin());		ReadyInsts.erase(ReadyInsts.begin());

// Move the scheduled instruction(s) to their dedicated places, if not		// Move the scheduled instruction(s) to their dedicated places, if not
// there yet.		// there yet.
for (ScheduleData *BundleMember = picked; BundleMember;		for (ScheduleData *BundleMember = picked; BundleMember;
BundleMember = BundleMember->NextInBundle) {		BundleMember = BundleMember->NextInBundle) {
Instruction *pickedInst = BundleMember->Inst;		Instruction *pickedInst = BundleMember->Inst;
if (pickedInst->getNextNode() != LastScheduledInst)		if (pickedInst->getNextNode() != LastScheduledInst)
pickedInst->moveBefore(LastScheduledInst);		pickedInst->moveBefore(LastScheduledInst);
LastScheduledInst = pickedInst;		LastScheduledInst = pickedInst;
}		}

BS->schedule(picked, ReadyInsts);		BS->schedule(picked, ReadyInsts);
NumToSchedule--;
}		}
assert(NumToSchedule == 0 && "could not schedule all instructions");
ABataevUnsubmitted Not Done Reply Inline Actions Can we keep this assert here or replace it with another one? It helps in many cases with incorrect scheduling. ABataev: Can we keep this assert here or replace it with another one? It helps in many cases with…
reamesAuthorUnsubmitted Done Reply Inline Actions Not easily. We'd need to track the increments through the calls to calculateDependencies since the set size now depends on the transitive use walk. I get why you want this, but I don't see an easy way to preserve it. Do you think it's worth the complexity of plumbing an assert only param through calculateDependencies? reames: Not easily. We'd need to track the increments through the calls to calculateDependencies since…
ABataevUnsubmitted Not Done Reply Inline Actions No, sure not. But can you try to implement something simple here? ABataev: No, sure not. But can you try to implement something simple here?
reamesAuthorUnsubmitted Done Reply Inline Actions Did you have something particular in mind? Not trying to be difficult, I just don't see a simple assert here. reames: Did you have something particular in mind? Not trying to be difficult, I just don't see a…
reamesAuthorUnsubmitted Done Reply Inline Actions Thinking about this, I could at least add an assert that all of the bundled instructions were scheduled. That wouldn't handle transitive users of the vector tree, but it would be better than nothing. Would that satisfy you? reames: Thinking about this, I could at least add an assert that all of the bundled instructions were…
ABataevUnsubmitted Not Done Reply Inline Actions This code is very sensitive and as you already noted it might be very useful to keep this (or similar) assert. Could you add an assert for bundled instructions and transitive users to be absolutely sure that neither this patch, nor future ones, break anything in scheduling? ABataev: This code is very sensitive and as you already noted it might be very useful to keep this (or…
reamesAuthorUnsubmitted Done Reply Inline Actions Landed in 2e507607, patch rebased. This turned out much simpler than I'd pictured, and is clearly warranted. Thank you for pushing on this. reames: Landed in 2e507607, patch rebased. This turned out much simpler than I'd pictured, and is…
ABataevUnsubmitted Not Done Reply Inline Actions Will try to test it tomorrow. ABataev: Will try to test it tomorrow.

// Avoid duplicate scheduling of the block.		// Avoid duplicate scheduling of the block.
BS->ScheduleStart = nullptr;		BS->ScheduleStart = nullptr;
}		}

unsigned BoUpSLP::getVectorElementSize(Value *V) {		unsigned BoUpSLP::getVectorElementSize(Value *V) {
// If V is a store, just return the width of the stored value (or value		// If V is a store, just return the width of the stored value (or value
// truncated just before storing) without traversing the expression tree.		// truncated just before storing) without traversing the expression tree.
▲ Show 20 Lines • Show All 2,568 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/64-bit-vector.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic < %s \| FileCheck %s
	; RUN: opt -S -slp-vectorizer -mtriple=aarch64-apple-ios -mcpu=cyclone < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -mtriple=aarch64-apple-ios -mcpu=cyclone < %s \| FileCheck %s
	; Currently disabled for a few subtargets (e.g. Kryo):			; Currently disabled for a few subtargets (e.g. Kryo):
	; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=kryo < %s \| FileCheck --check-prefix=NO_SLP %s			; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=kryo < %s \| FileCheck --check-prefix=NO_SLP %s
	; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -slp-min-reg-size=128 < %s \| FileCheck --check-prefix=NO_SLP %s			; RUN: opt -S -slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -slp-min-reg-size=128 < %s \| FileCheck --check-prefix=NO_SLP %s

	define void @f(float* %r, float* %w) {			define void @f(float* %r, float* %w) {
	; CHECK-LABEL: @f(			; CHECK-LABEL: @f(
	; CHECK-NEXT: [[R0:%.]] = getelementptr inbounds float, float [[R:%.*]], i64 0			; CHECK-NEXT: [[R0:%.]] = getelementptr inbounds float, float [[R:%.*]], i64 0
	; CHECK-NEXT: [[R1:%.]] = getelementptr inbounds float, float [[R]], i64 1			; CHECK-NEXT: [[R1:%.]] = getelementptr inbounds float, float [[R]], i64 1
				; CHECK-NEXT: [[W0:%.]] = getelementptr inbounds float, float [[W:%.*]], i64 0
				; CHECK-NEXT: [[W1:%.]] = getelementptr inbounds float, float [[W]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[R0]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[R0]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP2]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP2]], [[TMP2]]
	; CHECK-NEXT: [[W0:%.]] = getelementptr inbounds float, float [[W:%.*]], i64 0
	; CHECK-NEXT: [[W1:%.]] = getelementptr inbounds float, float [[W]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[W0]] to <2 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[W0]] to <2 x float>*
	; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4			; CHECK-NEXT: store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; NO_SLP-LABEL: @f(			; NO_SLP-LABEL: @f(
	; NO_SLP-NEXT: [[R0:%.]] = getelementptr inbounds float, float [[R:%.*]], i64 0			; NO_SLP-NEXT: [[R0:%.]] = getelementptr inbounds float, float [[R:%.*]], i64 0
	; NO_SLP-NEXT: [[R1:%.]] = getelementptr inbounds float, float [[R]], i64 1			; NO_SLP-NEXT: [[R1:%.]] = getelementptr inbounds float, float [[R]], i64 1
	; NO_SLP-NEXT: [[F0:%.]] = load float, float [[R0]], align 4			; NO_SLP-NEXT: [[F0:%.]] = load float, float [[R0]], align 4
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/commute.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer %s -slp-threshold=-10 \| FileCheck %s			; RUN: opt -S -slp-vectorizer %s -slp-threshold=-10 \| FileCheck %s
	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	%structA = type { [2 x float] }			%structA = type { [2 x float] }

	define void @test1(%structA* nocapture readonly %J, i32 %xmin, i32 %ymin) {			define void @test1(%structA* nocapture readonly %J, i32 %xmin, i32 %ymin) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[XMIN:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[XMIN:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[YMIN:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[YMIN:%.]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY3_LR_PH:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3_LR_PH:%.*]]
	; CHECK: for.body3.lr.ph:			; CHECK: for.body3.lr.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x float>
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCTA:%.]], %structA* [[J:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCTA:%.]], %structA* [[J:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCTA]], %structA [[J]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCTA]], %structA [[J]], i64 0, i32 0, i64 1
				; CHECK-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x float>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <2 x float> [[TMP5]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <2 x float> [[TMP5]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[ADD]], 0.000000e+00			; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[ADD]], 0.000000e+00
	Show All 26 Lines

	define void @test2(%structA* nocapture readonly %J, i32 %xmin, i32 %ymin) {			define void @test2(%structA* nocapture readonly %J, i32 %xmin, i32 %ymin) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[XMIN:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[XMIN:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[YMIN:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[YMIN:%.]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY3_LR_PH:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3_LR_PH:%.*]]
	; CHECK: for.body3.lr.ph:			; CHECK: for.body3.lr.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x float>
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCTA:%.]], %structA* [[J:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCTA:%.]], %structA* [[J:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCTA]], %structA [[J]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCTA]], %structA [[J]], i64 0, i32 0, i64 1
				; CHECK-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x float>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub fast <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <2 x float> [[TMP5]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <2 x float> [[TMP5]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[ADD]], 0.000000e+00			; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[ADD]], 0.000000e+00
	Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

	Show All 30 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	Show All 9 Lines
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	Show All 9 Lines
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	Show All 31 Lines
	; CHECK-NEXT: [[J_025:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[J_025:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P2_024:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR29:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[P2_024:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR29:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P1_023:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[P1_023:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 1
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 1
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 2			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 2
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 3			; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 3
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_023]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_023]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_024]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_024]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP8]], [[S_026]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP8]], [[S_026]]
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[J_019:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[IF_END]] ]			; CHECK-NEXT: [[J_019:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[IF_END]] ]
	; CHECK-NEXT: [[P2_018:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR16:%.]], [[IF_END]] ]			; CHECK-NEXT: [[P2_018:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR16:%.]], [[IF_END]] ]
	; CHECK-NEXT: [[P1_017:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[IF_END]] ]			; CHECK-NEXT: [[P1_017:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[IF_END]] ]
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 1
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 1
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 2			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 2
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 2			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 2
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 3			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 3
				; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_017]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_017]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_018]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_018]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP5]], [[S_020]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP5]], [[S_020]]
	; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]			; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]
	; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: if.end:			; CHECK: if.end:
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 3			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 3
	; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 4			; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 4
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 4			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 4
	; CHECK-NEXT: [[ARRAYIDX50:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 5			; CHECK-NEXT: [[ARRAYIDX50:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 5
	; CHECK-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 5			; CHECK-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 5
	; CHECK-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 6			; CHECK-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 6
	; CHECK-NEXT: [[ARRAYIDX63:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 6			; CHECK-NEXT: [[ARRAYIDX63:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 6
	; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 7			; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 7
				; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1_044]] to <8 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1_044]] to <8 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i8> [[TMP1]] to <8 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i8> [[TMP1]] to <8 x i32>
	; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 7
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[P2_045]] to <8 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[P2_045]] to <8 x i8>*
	; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = select <8 x i1> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> [[TMP6]]			; CHECK-NEXT: [[TMP9:%.*]] = select <8 x i1> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> [[TMP6]]
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/loadi8.ll

	Show All 12 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SCALE]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SCALE]], align 16
	; CHECK-NEXT: [[OFFSET:%.]] = getelementptr inbounds [[STRUCT_WEIGHT_T]], %struct.weight_t [[W]], i64 0, i32 1			; CHECK-NEXT: [[OFFSET:%.]] = getelementptr inbounds [[STRUCT_WEIGHT_T]], %struct.weight_t [[W]], i64 0, i32 1
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[OFFSET]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[OFFSET]], align 4
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds i8, i8 [[DST:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds i8, i8 [[DST:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 2
	; CHECK-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 2			; CHECK-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 3
				; CHECK-NEXT: [[ARRAYIDX2_3:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SRC]] to <4 x i8>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SRC]] to <4 x i8>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> [[TMP2]], align 1			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> [[TMP2]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[TMP3]] to <4 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[TMP3]] to <4 x i32>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], [[TMP4]]
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP6]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[TMP6]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp ult <4 x i32> [[TMP8]], <i32 256, i32 256, i32 256, i32 256>			; CHECK-NEXT: [[TMP9:%.*]] = icmp ult <4 x i32> [[TMP8]], <i32 256, i32 256, i32 256, i32 256>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp sgt <4 x i32> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.*]] = icmp sgt <4 x i32> [[TMP8]], zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>			; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
	; CHECK-NEXT: [[TMP12:%.*]] = select <4 x i1> [[TMP9]], <4 x i32> [[TMP8]], <4 x i32> [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = select <4 x i1> [[TMP9]], <4 x i32> [[TMP8]], <4 x i32> [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = trunc <4 x i32> [[TMP12]] to <4 x i8>			; CHECK-NEXT: [[TMP13:%.*]] = trunc <4 x i32> [[TMP12]] to <4 x i8>
	; CHECK-NEXT: [[ARRAYIDX2_3:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[DST]] to <4 x i8>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[DST]] to <4 x i8>*
	; CHECK-NEXT: store <4 x i8> [[TMP13]], <4 x i8>* [[TMP14]], align 1			; CHECK-NEXT: store <4 x i8> [[TMP13]], <4 x i8>* [[TMP14]], align 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%scale = getelementptr inbounds %struct.weight_t, %struct.weight_t* %w, i64 0, i32 0			%scale = getelementptr inbounds %struct.weight_t, %struct.weight_t* %w, i64 0, i32 0
	%0 = load i32, i32* %scale, align 16			%0 = load i32, i32* %scale, align 16
	%offset = getelementptr inbounds %struct.weight_t, %struct.weight_t* %w, i64 0, i32 1			%offset = getelementptr inbounds %struct.weight_t, %struct.weight_t* %w, i64 0, i32 1
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

	Show All 11 Lines
	; CHECK-LABEL: @wrap_mul4(			; CHECK-LABEL: @wrap_mul4(
	; CHECK-NEXT: [[ARRAYIDX1_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A:%.*]], i64 0, i64 0			; CHECK-NEXT: [[ARRAYIDX1_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[TEMP:%.]] = load double, double [[ARRAYIDX1_I]], align 8			; CHECK-NEXT: [[TEMP:%.]] = load double, double [[ARRAYIDX1_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX3_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B:%.*]], i64 0, i64 0			; CHECK-NEXT: [[ARRAYIDX3_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX5_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 0, i64 1			; CHECK-NEXT: [[ARRAYIDX5_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 0, i64 1
	; CHECK-NEXT: [[TEMP2:%.]] = load double, double [[ARRAYIDX5_I]], align 8			; CHECK-NEXT: [[TEMP2:%.]] = load double, double [[ARRAYIDX5_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX7_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX7_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 0
	; CHECK-NEXT: [[ARRAYIDX13_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 1			; CHECK-NEXT: [[ARRAYIDX13_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 1
				; CHECK-NEXT: [[ARRAYIDX18_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 1
				; CHECK-NEXT: [[ARRAYIDX25_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 2
				; CHECK-NEXT: [[ARRAYIDX30_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 2
				; CHECK-NEXT: [[ARRAYIDX37_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 3
				; CHECK-NEXT: [[ARRAYIDX42_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 3
				; CHECK-NEXT: [[ARRAYIDX47_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 0
				; CHECK-NEXT: [[TEMP10:%.]] = load double, double [[ARRAYIDX47_I]], align 8
				; CHECK-NEXT: [[ARRAYIDX52_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 1
				; CHECK-NEXT: [[TEMP11:%.]] = load double, double [[ARRAYIDX52_I]], align 8
				; CHECK-NEXT: [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[ARRAYIDX3_I]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[ARRAYIDX3_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[TEMP]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[TEMP]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX18_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 1
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7_I]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[TEMP2]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[TEMP2]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP5]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP5]], [[TMP10]]
	; CHECK-NEXT: [[ARRAYIDX25_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 2			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[OUT]] to <2 x double>*
	; CHECK-NEXT: [[ARRAYIDX30_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 2
	; CHECK-NEXT: [[ARRAYIDX37_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 0, i64 3
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[ARRAYIDX25_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP13:%.]] = load <2 x double>, <2 x double> [[TMP12]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x double> [[TMP4]], [[TMP13]]
	; CHECK-NEXT: [[ARRAYIDX42_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 3
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[ARRAYIDX30_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP16:%.]] = load <2 x double>, <2 x double> [[TMP15]], align 8
	; CHECK-NEXT: [[TMP17:%.*]] = fmul <2 x double> [[TMP9]], [[TMP16]]
	; CHECK-NEXT: [[TMP18:%.*]] = fadd <2 x double> [[TMP14]], [[TMP17]]
	; CHECK-NEXT: [[ARRAYIDX47_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 0
	; CHECK-NEXT: [[TEMP10:%.]] = load double, double [[ARRAYIDX47_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX52_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 1
	; CHECK-NEXT: [[TEMP11:%.]] = load double, double [[ARRAYIDX52_I]], align 8
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TEMP10]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = fmul <2 x double> [[TMP2]], [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <2 x double> [[TMP22]], double [[TEMP11]], i32 1
	; CHECK-NEXT: [[TMP24:%.*]] = fmul <2 x double> [[TMP7]], [[TMP23]]
	; CHECK-NEXT: [[TMP25:%.*]] = fadd <2 x double> [[TMP21]], [[TMP24]]
	; CHECK-NEXT: [[TMP26:%.*]] = fmul <2 x double> [[TMP13]], [[TMP20]]
	; CHECK-NEXT: [[TMP27:%.*]] = fmul <2 x double> [[TMP16]], [[TMP23]]
	; CHECK-NEXT: [[TMP28:%.*]] = fadd <2 x double> [[TMP26]], [[TMP27]]
	; CHECK-NEXT: [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 1
	; CHECK-NEXT: [[TMP29:%.]] = bitcast double [[OUT]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP29]], align 8
	; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, double [[OUT]], i64 2			; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, double [[OUT]], i64 2
	; CHECK-NEXT: [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6:%.]] = getelementptr inbounds double, double [[OUT]], i64 3			; CHECK-NEXT: [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6:%.]] = getelementptr inbounds double, double [[OUT]], i64 3
	; CHECK-NEXT: [[TMP30:%.]] = bitcast double [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[ARRAYIDX25_I]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP18]], <2 x double>* [[TMP30]], align 8			; CHECK-NEXT: [[TMP14:%.]] = load <2 x double>, <2 x double> [[TMP13]], align 8
				; CHECK-NEXT: [[TMP15:%.*]] = fmul <2 x double> [[TMP4]], [[TMP14]]
				; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[ARRAYIDX30_I]] to <2 x double>*
				; CHECK-NEXT: [[TMP17:%.]] = load <2 x double>, <2 x double> [[TMP16]], align 8
				; CHECK-NEXT: [[TMP18:%.*]] = fmul <2 x double> [[TMP9]], [[TMP17]]
				; CHECK-NEXT: [[TMP19:%.*]] = fadd <2 x double> [[TMP15]], [[TMP18]]
				; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8
				; CHECK-NEXT: [[TMP20:%.]] = bitcast double [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP19]], <2 x double>* [[TMP20]], align 8
	; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.]] = getelementptr inbounds double, double [[OUT]], i64 4			; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.]] = getelementptr inbounds double, double [[OUT]], i64 4
	; CHECK-NEXT: [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10:%.]] = getelementptr inbounds double, double [[OUT]], i64 5			; CHECK-NEXT: [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10:%.]] = getelementptr inbounds double, double [[OUT]], i64 5
	; CHECK-NEXT: [[TMP31:%.]] = bitcast double [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]] to <2 x double>*			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0
	; CHECK-NEXT: store <2 x double> [[TMP25]], <2 x double>* [[TMP31]], align 8			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <2 x double> [[TMP21]], double [[TEMP10]], i32 1
				; CHECK-NEXT: [[TMP23:%.*]] = fmul <2 x double> [[TMP2]], [[TMP22]]
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <2 x double> [[TMP24]], double [[TEMP11]], i32 1
				; CHECK-NEXT: [[TMP26:%.*]] = fmul <2 x double> [[TMP7]], [[TMP25]]
				; CHECK-NEXT: [[TMP27:%.*]] = fadd <2 x double> [[TMP23]], [[TMP26]]
				; CHECK-NEXT: [[TMP28:%.]] = bitcast double [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP27]], <2 x double>* [[TMP28]], align 8
	; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.]] = getelementptr inbounds double, double [[OUT]], i64 6			; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.]] = getelementptr inbounds double, double [[OUT]], i64 6
	; CHECK-NEXT: [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14:%.]] = getelementptr inbounds double, double [[OUT]], i64 7			; CHECK-NEXT: [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14:%.]] = getelementptr inbounds double, double [[OUT]], i64 7
				; CHECK-NEXT: [[TMP29:%.*]] = fmul <2 x double> [[TMP14]], [[TMP22]]
				; CHECK-NEXT: [[TMP30:%.*]] = fmul <2 x double> [[TMP17]], [[TMP25]]
				; CHECK-NEXT: [[TMP31:%.*]] = fadd <2 x double> [[TMP29]], [[TMP30]]
	; CHECK-NEXT: [[TMP32:%.]] = bitcast double [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]] to <2 x double>*			; CHECK-NEXT: [[TMP32:%.]] = bitcast double [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP28]], <2 x double>* [[TMP32]], align 8			; CHECK-NEXT: store <2 x double> [[TMP31]], <2 x double>* [[TMP32]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx1.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 0			%arrayidx1.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 0
	%temp = load double, double* %arrayidx1.i, align 8			%temp = load double, double* %arrayidx1.i, align 8
	%arrayidx3.i = getelementptr inbounds [4 x double], [4 x double]* %B, i64 0, i64 0			%arrayidx3.i = getelementptr inbounds [4 x double], [4 x double]* %B, i64 0, i64 0
	%temp1 = load double, double* %arrayidx3.i, align 8			%temp1 = load double, double* %arrayidx3.i, align 8
	%mul.i = fmul double %temp, %temp1			%mul.i = fmul double %temp, %temp1
	%arrayidx5.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 1			%arrayidx5.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 1
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

	Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret void			ret void
	}			}

	define void @no_version(i32* nocapture %dst, i32* nocapture readonly %src) {			define void @no_version(i32* nocapture %dst, i32* nocapture readonly %src) {
	; CHECK-LABEL: @no_version(			; CHECK-LABEL: @no_version(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SRC_GEP_1:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1			; CHECK-NEXT: [[SRC_GEP_1:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
				; CHECK-NEXT: [[DST_GEP_1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = ashr <2 x i32> [[TMP1]], <i32 16, i32 16>			; CHECK-NEXT: [[TMP2:%.*]] = ashr <2 x i32> [[TMP1]], <i32 16, i32 16>
	; CHECK-NEXT: [[DST_GEP_1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <2 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%src.0 = load i32, i32* %src, align 4			%src.0 = load i32, i32* %src, align 4
	%src.gep.1 = getelementptr inbounds i32, i32* %src, i64 1			%src.gep.1 = getelementptr inbounds i32, i32* %src, i64 1
	%src.1 = load i32, i32* %src.gep.1, align 4			%src.1 = load i32, i32* %src.gep.1, align 4
	▲ Show 20 Lines • Show All 556 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[B_GEP_11:%.]] = getelementptr i8, i8 [[B]], i64 11			; CHECK-NEXT: [[B_GEP_11:%.]] = getelementptr i8, i8 [[B]], i64 11
	; CHECK-NEXT: [[A_GEP_12:%.]] = getelementptr i8, i8 [[A]], i64 12			; CHECK-NEXT: [[A_GEP_12:%.]] = getelementptr i8, i8 [[A]], i64 12
	; CHECK-NEXT: [[B_GEP_12:%.]] = getelementptr i8, i8 [[B]], i64 12			; CHECK-NEXT: [[B_GEP_12:%.]] = getelementptr i8, i8 [[B]], i64 12
	; CHECK-NEXT: [[A_GEP_13:%.]] = getelementptr i8, i8 [[A]], i64 13			; CHECK-NEXT: [[A_GEP_13:%.]] = getelementptr i8, i8 [[A]], i64 13
	; CHECK-NEXT: [[B_GEP_13:%.]] = getelementptr i8, i8 [[B]], i64 13			; CHECK-NEXT: [[B_GEP_13:%.]] = getelementptr i8, i8 [[B]], i64 13
	; CHECK-NEXT: [[A_GEP_14:%.]] = getelementptr i8, i8 [[A]], i64 14			; CHECK-NEXT: [[A_GEP_14:%.]] = getelementptr i8, i8 [[A]], i64 14
	; CHECK-NEXT: [[B_GEP_14:%.]] = getelementptr i8, i8 [[B]], i64 14			; CHECK-NEXT: [[B_GEP_14:%.]] = getelementptr i8, i8 [[B]], i64 14
	; CHECK-NEXT: [[A_GEP_15:%.]] = getelementptr i8, i8 [[A]], i64 15			; CHECK-NEXT: [[A_GEP_15:%.]] = getelementptr i8, i8 [[A]], i64 15
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[A_GEP_0]] to <16 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[B_GEP_15:%.]] = getelementptr i8, i8 [[B]], i64 15			; CHECK-NEXT: [[B_GEP_15:%.]] = getelementptr i8, i8 [[B]], i64 15
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[B_GEP_0]] to <16 x i8>*
	; CHECK-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[TMP2]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[R_GEP_0:%.]] = getelementptr i8, i8 [[ARG1]], i64 0			; CHECK-NEXT: [[R_GEP_0:%.]] = getelementptr i8, i8 [[ARG1]], i64 0
	; CHECK-NEXT: [[R_GEP_1:%.]] = getelementptr i8, i8 [[ARG1]], i64 1			; CHECK-NEXT: [[R_GEP_1:%.]] = getelementptr i8, i8 [[ARG1]], i64 1
	; CHECK-NEXT: [[R_GEP_2:%.]] = getelementptr i8, i8 [[ARG1]], i64 2			; CHECK-NEXT: [[R_GEP_2:%.]] = getelementptr i8, i8 [[ARG1]], i64 2
	; CHECK-NEXT: [[R_GEP_3:%.]] = getelementptr i8, i8 [[ARG1]], i64 3			; CHECK-NEXT: [[R_GEP_3:%.]] = getelementptr i8, i8 [[ARG1]], i64 3
	; CHECK-NEXT: [[R_GEP_4:%.]] = getelementptr i8, i8 [[ARG1]], i64 4			; CHECK-NEXT: [[R_GEP_4:%.]] = getelementptr i8, i8 [[ARG1]], i64 4
	; CHECK-NEXT: [[R_GEP_5:%.]] = getelementptr i8, i8 [[ARG1]], i64 5			; CHECK-NEXT: [[R_GEP_5:%.]] = getelementptr i8, i8 [[ARG1]], i64 5
	; CHECK-NEXT: [[R_GEP_6:%.]] = getelementptr i8, i8 [[ARG1]], i64 6			; CHECK-NEXT: [[R_GEP_6:%.]] = getelementptr i8, i8 [[ARG1]], i64 6
	; CHECK-NEXT: [[R_GEP_7:%.]] = getelementptr i8, i8 [[ARG1]], i64 7			; CHECK-NEXT: [[R_GEP_7:%.]] = getelementptr i8, i8 [[ARG1]], i64 7
	; CHECK-NEXT: [[R_GEP_8:%.]] = getelementptr i8, i8 [[ARG1]], i64 8			; CHECK-NEXT: [[R_GEP_8:%.]] = getelementptr i8, i8 [[ARG1]], i64 8
	; CHECK-NEXT: [[R_GEP_9:%.]] = getelementptr i8, i8 [[ARG1]], i64 9			; CHECK-NEXT: [[R_GEP_9:%.]] = getelementptr i8, i8 [[ARG1]], i64 9
	; CHECK-NEXT: [[R_GEP_10:%.]] = getelementptr i8, i8 [[ARG1]], i64 10			; CHECK-NEXT: [[R_GEP_10:%.]] = getelementptr i8, i8 [[ARG1]], i64 10
	; CHECK-NEXT: [[R_GEP_11:%.]] = getelementptr i8, i8 [[ARG1]], i64 11			; CHECK-NEXT: [[R_GEP_11:%.]] = getelementptr i8, i8 [[ARG1]], i64 11
	; CHECK-NEXT: [[R_GEP_12:%.]] = getelementptr i8, i8 [[ARG1]], i64 12			; CHECK-NEXT: [[R_GEP_12:%.]] = getelementptr i8, i8 [[ARG1]], i64 12
	; CHECK-NEXT: [[R_GEP_13:%.]] = getelementptr i8, i8 [[ARG1]], i64 13			; CHECK-NEXT: [[R_GEP_13:%.]] = getelementptr i8, i8 [[ARG1]], i64 13
	; CHECK-NEXT: [[R_GEP_14:%.]] = getelementptr i8, i8 [[ARG1]], i64 14			; CHECK-NEXT: [[R_GEP_14:%.]] = getelementptr i8, i8 [[ARG1]], i64 14
	; CHECK-NEXT: [[R_GEP_15:%.]] = getelementptr i8, i8 [[ARG1]], i64 15			; CHECK-NEXT: [[R_GEP_15:%.]] = getelementptr i8, i8 [[ARG1]], i64 15
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[A_GEP_0]] to <16 x i8>*
				; CHECK-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> [[TMP0]], align 1
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[B_GEP_0]] to <16 x i8>*
				; CHECK-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> [[TMP2]], align 1
				; CHECK-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[R_GEP_0]] to <16 x i8>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[R_GEP_0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* [[TMP5]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* [[TMP5]], align 1
	; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds i8, i8 [[ARG3]], i64 15			; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds i8, i8 [[ARG3]], i64 15
	; CHECK-NEXT: [[T22:%.]] = bitcast i8 [[ARG3]] to <16 x i8>*			; CHECK-NEXT: [[T22:%.]] = bitcast i8 [[ARG3]] to <16 x i8>*
	; CHECK-NEXT: call void @foo(i8* nonnull [[T4]])			; CHECK-NEXT: call void @foo(i8* nonnull [[T4]])
	; CHECK-NEXT: [[T26:%.]] = load i8, i8 [[ARG3]], align 1			; CHECK-NEXT: [[T26:%.]] = load i8, i8 [[ARG3]], align 1
	; CHECK-NEXT: [[T27:%.]] = load i8, i8 [[ARG2:%.*]], align 1			; CHECK-NEXT: [[T27:%.]] = load i8, i8 [[ARG2:%.*]], align 1
	; CHECK-NEXT: [[T28:%.*]] = xor i8 [[T27]], [[T26]]			; CHECK-NEXT: [[T28:%.*]] = xor i8 [[T27]], [[T26]]
	▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/sdiv-pow2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a57 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a57 \| FileCheck %s
	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	define void @test1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c) {			define void @test1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>			; CHECK-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-and-reduction.ll

	Show All 15 Lines
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = xor <8 x i8> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = xor <8 x i8> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i8 [[TMP5]], 1			; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i8 [[TMP5]], 1
	; CHECK-NEXT: ret i8 [[OP_EXTRA]]			; CHECK-NEXT: ret i8 [[OP_EXTRA]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-or-reduction.ll

	Show All 15 Lines
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = xor <8 x i8> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = xor <8 x i8> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.or.v8i8(<8 x i8> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.or.v8i8(<8 x i8> [[TMP4]])
	; CHECK-NEXT: ret i8 [[TMP5]]			; CHECK-NEXT: ret i8 [[TMP5]]
	;			;

	entry:			entry:
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-xor-reduction.ll

	Show All 15 Lines
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[A]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX]] to <8 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds [[STRUCT_BUF]], %struct.buf [[B]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ARRAYIDX3]] to <8 x i8>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> [[TMP2]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = and <8 x i8> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = xor i8 [[TMP5]], 1			; CHECK-NEXT: [[OP_EXTRA:%.*]] = xor i8 [[TMP5]], 1
	; CHECK-NEXT: ret i8 [[OP_EXTRA]]			; CHECK-NEXT: ret i8 [[OP_EXTRA]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

	Show All 9 Lines
	; CHECK-LABEL: @patatino(			; CHECK-LABEL: @patatino(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG23:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG23:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata [[META19:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata [[META19:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG25:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG25:![0-9]+]]
	; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg [[DBG26:![0-9]+]]			; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg [[DBG26:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META21:![0-9]+]], metadata !DIExpression()), !dbg [[DBG27:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META21:![0-9]+]], metadata !DIExpression()), !dbg [[DBG27:![0-9]+]]
	; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg [[DBG28:![0-9]+]]			; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg [[DBG28:![0-9]+]]
				; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META22:![0-9]+]], metadata !DIExpression()), !dbg [[DBG29:![0-9]+]]
				; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg [[DBG30:![0-9]+]]
				; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg [[DBG31:![0-9]+]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg [[DBG26]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg [[DBG26]]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg [[DBG26]], !tbaa [[TBAA29:![0-9]+]]			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg [[DBG26]], !tbaa [[TBAA32:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META22:![0-9]+]], metadata !DIExpression()), !dbg [[DBG33:![0-9]+]]
	; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg [[DBG34:![0-9]+]]
	; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg [[DBG35:![0-9]+]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg [[DBG36:![0-9]+]]			; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg [[DBG36:![0-9]+]]
	; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg [[DBG36]], !tbaa [[TBAA29]]			; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg [[DBG36]], !tbaa [[TBAA32]]
	; CHECK-NEXT: ret void, !dbg [[DBG37:![0-9]+]]			; CHECK-NEXT: ret void, !dbg [[DBG37:![0-9]+]]
	;			;
	entry:			entry:
	call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23
	call void @llvm.dbg.value(metadata i64 %i, metadata !19, metadata !DIExpression()), !dbg !24			call void @llvm.dbg.value(metadata i64 %i, metadata !19, metadata !DIExpression()), !dbg !24
	call void @llvm.dbg.value(metadata %struct.S* %p, metadata !20, metadata !DIExpression()), !dbg !25			call void @llvm.dbg.value(metadata %struct.S* %p, metadata !20, metadata !DIExpression()), !dbg !25
	%x1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 %n, i32 0, !dbg !26			%x1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 %n, i32 0, !dbg !26
	%0 = load i64, i64* %x1, align 8, !dbg !26, !tbaa !27			%0 = load i64, i64* %x1, align 8, !dbg !26, !tbaa !27
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll

	Show All 9 Lines
	define void @test(i64* %ptr, i64* noalias %res) {			define void @test(i64* %ptr, i64* noalias %res) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[CALL_I_I:%.]] = call i32 @get_ptr()			; CHECK-NEXT: [[CALL_I_I:%.]] = call i32 @get_ptr()
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 2			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 2
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 1
				; CHECK-NEXT: [[GEP_3:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 3
				; CHECK-NEXT: [[RES_1:%.]] = getelementptr i64, i64 [[RES:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[CALL_I_I]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[CALL_I_I]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 2			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 2
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr i32, i32 [[CALL_I_I]], i32 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[GEP_1]] to <2 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[GEP_1]] to <2 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 2			; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 2
	; CHECK-NEXT: [[TMP4:%.*]] = zext <2 x i32> [[TMP1]] to <2 x i64>			; CHECK-NEXT: [[TMP4:%.*]] = zext <2 x i32> [[TMP1]] to <2 x i64>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <2 x i32> [[TMP3]] to <2 x i64>			; CHECK-NEXT: [[TMP5:%.*]] = zext <2 x i32> [[TMP3]] to <2 x i64>
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i64> [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i64> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[RES_1:%.]] = getelementptr i64, i64 [[RES:%.*]], i64 1
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[RES]] to <2 x i64>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[RES]] to <2 x i64>*
	; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* [[TMP7]], align 8
	; CHECK-NEXT: [[C:%.*]] = call i1 @cond()			; CHECK-NEXT: [[C:%.*]] = call i1 @cond()
	; CHECK-NEXT: br i1 [[C]], label [[FOR_BODY]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[C]], label [[FOR_BODY]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {			define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
	; CHECK-LABEL: @build_vec_v4i32_reuse_1(			; CHECK-LABEL: @build_vec_v4i32_reuse_1(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i64 0
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	Show All 13 Lines
	}			}

	define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {			define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
	; CHECK-LABEL: @build_vec_v4i32_3_binops(			; CHECK-LABEL: @build_vec_v4i32_3_binops(
	; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
	; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP7]]
				; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {			define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
	; CHECK-LABEL: @build_vec_v4i32_reuse_1(			; CHECK-LABEL: @build_vec_v4i32_reuse_1(
	; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = extractelement <2 x i32> [[V1:%.]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[V1]], i64 0
	; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1			; CHECK-NEXT: [[TMP3:%.]] = extractelement <2 x i32> [[V0:%.]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[V0]], i64 0
	; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_0]], i64 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0_1]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sub <2 x i32> [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP2_31:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP2_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	Show All 13 Lines
	}			}

	define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {			define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
	; CHECK-LABEL: @build_vec_v4i32_3_binops(			; CHECK-LABEL: @build_vec_v4i32_3_binops(
	; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
	; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]			; CHECK-NEXT: [[TMP6:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]
	; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[SHUFFLE]], [[TMP7]]
				; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]			; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
	;			;
	%v0.0 = extractelement <2 x i32> %v0, i32 0			%v0.0 = extractelement <2 x i32> %v0, i32 0
	%v0.1 = extractelement <2 x i32> %v0, i32 1			%v0.1 = extractelement <2 x i32> %v0, i32 1
	%v1.0 = extractelement <2 x i32> %v1, i32 0			%v1.0 = extractelement <2 x i32> %v1, i32 0
	%v1.1 = extractelement <2 x i32> %v1, i32 1			%v1.1 = extractelement <2 x i32> %v1, i32 1
	%tmp0.0 = add i32 %v0.0, %v1.0			%tmp0.0 = add i32 %v0.0, %v1.0
	%tmp0.1 = add i32 %v0.1, %v1.1			%tmp0.1 = add i32 %v0.1, %v1.1
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s352.ll

	Show All 27 Lines
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[PREHEADER]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[PREHEADER]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[DOT_115:%.]] = phi float [ 0.000000e+00, [[PREHEADER]] ], [ [[ADD39:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[DOT_115:%.]] = phi float [ 0.000000e+00, [[PREHEADER]] ], [ [[ADD39:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA:%.]], %struct.GlobalData* @global_data, i64 0, i32 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA:%.]], %struct.GlobalData* @global_data, i64 0, i32 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP0]]			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP0]]
				; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX6]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX6]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[DOT_115]], [[TMP6]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[DOT_115]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
	; CHECK-NEXT: [[ADD15:%.*]] = fadd float [[ADD]], [[TMP7]]			; CHECK-NEXT: [[ADD15:%.*]] = fadd float [[ADD]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP8]]			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP8]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP8]]			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 0, i64 [[TMP9]]
				; CHECK-NEXT: [[ARRAYIDX29:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX18]] to <2 x float>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast float [[ARRAYIDX18]] to <2 x float>*
	; CHECK-NEXT: [[TMP11:%.]] = load <2 x float>, <2 x float> [[TMP10]], align 4			; CHECK-NEXT: [[TMP11:%.]] = load <2 x float>, <2 x float> [[TMP10]], align 4
	; CHECK-NEXT: [[ARRAYIDX29:%.]] = getelementptr inbounds [[STRUCT_GLOBALDATA]], %struct.GlobalData @global_data, i64 0, i32 3, i64 [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX21]] to <2 x float>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[ARRAYIDX21]] to <2 x float>*
	; CHECK-NEXT: [[TMP13:%.]] = load <2 x float>, <2 x float> [[TMP12]], align 4			; CHECK-NEXT: [[TMP13:%.]] = load <2 x float>, <2 x float> [[TMP12]], align 4
	; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x float> [[TMP11]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x float> [[TMP11]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP14]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP14]], i32 0
	; CHECK-NEXT: [[ADD23:%.*]] = fadd float [[ADD15]], [[TMP15]]			; CHECK-NEXT: [[ADD23:%.*]] = fadd float [[ADD15]], [[TMP15]]
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP14]], i32 1			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP14]], i32 1
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD23]], [[TMP16]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD23]], [[TMP16]]
	; CHECK-NEXT: [[TMP17:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[TMP17:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 4
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/widen.ll

	Show All 19 Lines
	; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds i8, i8 [[A]], i64 8			; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds i8, i8 [[A]], i64 8
	; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds i8, i8 [[A]], i64 9			; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds i8, i8 [[A]], i64 9
	; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds i8, i8 [[A]], i64 10			; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds i8, i8 [[A]], i64 10
	; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds i8, i8 [[A]], i64 11			; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds i8, i8 [[A]], i64 11
	; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds i8, i8 [[A]], i64 12			; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds i8, i8 [[A]], i64 12
	; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds i8, i8 [[A]], i64 13			; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds i8, i8 [[A]], i64 13
	; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds i8, i8 [[A]], i64 14			; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds i8, i8 [[A]], i64 14
	; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds i8, i8 [[A]], i64 15			; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds i8, i8 [[A]], i64 15
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[A]] to <8 x i8>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[ARRAYIDX_8]] to <8 x i8>*
	; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
	; CHECK-NEXT: [[TMP6:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i16>
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw <8 x i16> [[TMP5]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
	; CHECK-NEXT: [[TMP8:%.*]] = shl nuw <8 x i16> [[TMP6]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds i16, i16 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds i16, i16 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds i16, i16 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds i16, i16 [[B]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds i16, i16 [[B]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds i16, i16 [[B]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds i16, i16 [[B]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds i16, i16 [[B]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds i16, i16 [[B]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds i16, i16 [[B]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds i16, i16 [[B]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_7:%.]] = getelementptr inbounds i16, i16 [[B]], i64 7
	; CHECK-NEXT: [[ARRAYIDX3_8:%.]] = getelementptr inbounds i16, i16 [[B]], i64 8			; CHECK-NEXT: [[ARRAYIDX3_8:%.]] = getelementptr inbounds i16, i16 [[B]], i64 8
	; CHECK-NEXT: [[ARRAYIDX3_9:%.]] = getelementptr inbounds i16, i16 [[B]], i64 9			; CHECK-NEXT: [[ARRAYIDX3_9:%.]] = getelementptr inbounds i16, i16 [[B]], i64 9
	; CHECK-NEXT: [[ARRAYIDX3_10:%.]] = getelementptr inbounds i16, i16 [[B]], i64 10			; CHECK-NEXT: [[ARRAYIDX3_10:%.]] = getelementptr inbounds i16, i16 [[B]], i64 10
	; CHECK-NEXT: [[ARRAYIDX3_11:%.]] = getelementptr inbounds i16, i16 [[B]], i64 11			; CHECK-NEXT: [[ARRAYIDX3_11:%.]] = getelementptr inbounds i16, i16 [[B]], i64 11
	; CHECK-NEXT: [[ARRAYIDX3_12:%.]] = getelementptr inbounds i16, i16 [[B]], i64 12			; CHECK-NEXT: [[ARRAYIDX3_12:%.]] = getelementptr inbounds i16, i16 [[B]], i64 12
	; CHECK-NEXT: [[ARRAYIDX3_13:%.]] = getelementptr inbounds i16, i16 [[B]], i64 13			; CHECK-NEXT: [[ARRAYIDX3_13:%.]] = getelementptr inbounds i16, i16 [[B]], i64 13
	; CHECK-NEXT: [[ARRAYIDX3_14:%.]] = getelementptr inbounds i16, i16 [[B]], i64 14			; CHECK-NEXT: [[ARRAYIDX3_14:%.]] = getelementptr inbounds i16, i16 [[B]], i64 14
	; CHECK-NEXT: [[ARRAYIDX3_15:%.]] = getelementptr inbounds i16, i16 [[B]], i64 15			; CHECK-NEXT: [[ARRAYIDX3_15:%.]] = getelementptr inbounds i16, i16 [[B]], i64 15
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i16 [[B]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[A]] to <8 x i8>*
	; CHECK-NEXT: store <8 x i16> [[TMP7]], <8 x i16>* [[TMP9]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
				; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i16>
				; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <8 x i16> [[TMP3]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				; CHECK-NEXT: [[TMP5:%.]] = bitcast i16 [[B]] to <8 x i16>*
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[ARRAYIDX_8]] to <8 x i8>*
				; CHECK-NEXT: [[TMP7:%.]] = load <8 x i8>, <8 x i8> [[TMP6]], align 1
				; CHECK-NEXT: [[TMP8:%.*]] = zext <8 x i8> [[TMP7]] to <8 x i16>
				; CHECK-NEXT: [[TMP9:%.*]] = shl nuw <8 x i16> [[TMP8]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				; CHECK-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* [[TMP5]], align 2
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[ARRAYIDX3_8]] to <8 x i16>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[ARRAYIDX3_8]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* [[TMP10]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* [[TMP10]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx.1 = getelementptr inbounds i8, i8* %a, i64 1			%arrayidx.1 = getelementptr inbounds i8, i8* %a, i64 1
	%arrayidx.2 = getelementptr inbounds i8, i8* %a, i64 2			%arrayidx.2 = getelementptr inbounds i8, i8* %a, i64 2
	%arrayidx.3 = getelementptr inbounds i8, i8* %a, i64 3			%arrayidx.3 = getelementptr inbounds i8, i8* %a, i64 3
	%arrayidx.4 = getelementptr inbounds i8, i8* %a, i64 4			%arrayidx.4 = getelementptr inbounds i8, i8* %a, i64 4
	%arrayidx.5 = getelementptr inbounds i8, i8* %a, i64 5			%arrayidx.5 = getelementptr inbounds i8, i8* %a, i64 5
	%arrayidx.7 = getelementptr inbounds i8, i8* %a, i64 7			%arrayidx.7 = getelementptr inbounds i8, i8* %a, i64 7
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	;
store half %fma1, half addrspace(3)* %arrayidx6, align 2		store half %fma1, half addrspace(3)* %arrayidx6, align 2
ret void		ret void
}		}

define amdgpu_kernel void @test1_fabs_scalar_fma_v2f16(half addrspace(3)* %a, half addrspace(3)* %b, half addrspace(3)* %c, half addrspace(3)* %d) {		define amdgpu_kernel void @test1_fabs_scalar_fma_v2f16(half addrspace(3)* %a, half addrspace(3)* %b, half addrspace(3)* %c, half addrspace(3)* %d) {
; GCN-LABEL: @test1_fabs_scalar_fma_v2f16(		; GCN-LABEL: @test1_fabs_scalar_fma_v2f16(
; GCN-NEXT: [[I1:%.]] = load half, half addrspace(3) [[B:%.*]], align 2		; GCN-NEXT: [[I1:%.]] = load half, half addrspace(3) [[B:%.*]], align 2
; GCN-NEXT: [[I1_FABS:%.*]] = call half @llvm.fabs.f16(half [[I1]])		; GCN-NEXT: [[I1_FABS:%.*]] = call half @llvm.fabs.f16(half [[I1]])
; GCN-NEXT: [[TMP1:%.]] = bitcast half addrspace(3) [[A:%.]] to <2 x half> addrspace(3)
; GCN-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP1]], align 2
; GCN-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds half, half addrspace(3) [[B]], i64 1		; GCN-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds half, half addrspace(3) [[B]], i64 1
; GCN-NEXT: [[I4:%.]] = load half, half addrspace(3) [[ARRAYIDX4]], align 2		; GCN-NEXT: [[I4:%.]] = load half, half addrspace(3) [[ARRAYIDX4]], align 2
		; GCN-NEXT: [[TMP1:%.]] = bitcast half addrspace(3) [[A:%.]] to <2 x half> addrspace(3)
		; GCN-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP1]], align 2
; GCN-NEXT: [[TMP3:%.]] = bitcast half addrspace(3) [[C:%.]] to <2 x half> addrspace(3)		; GCN-NEXT: [[TMP3:%.]] = bitcast half addrspace(3) [[C:%.]] to <2 x half> addrspace(3)
; GCN-NEXT: [[TMP4:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP3]], align 2		; GCN-NEXT: [[TMP4:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP3]], align 2
; GCN-NEXT: [[TMP5:%.*]] = insertelement <2 x half> poison, half [[I1_FABS]], i32 0		; GCN-NEXT: [[TMP5:%.*]] = insertelement <2 x half> poison, half [[I1_FABS]], i32 0
; GCN-NEXT: [[TMP6:%.*]] = insertelement <2 x half> [[TMP5]], half [[I4]], i32 1		; GCN-NEXT: [[TMP6:%.*]] = insertelement <2 x half> [[TMP5]], half [[I4]], i32 1
; GCN-NEXT: [[TMP7:%.*]] = call <2 x half> @llvm.fma.v2f16(<2 x half> [[TMP2]], <2 x half> [[TMP6]], <2 x half> [[TMP4]])		; GCN-NEXT: [[TMP7:%.*]] = call <2 x half> @llvm.fma.v2f16(<2 x half> [[TMP2]], <2 x half> [[TMP6]], <2 x half> [[TMP4]])
; GCN-NEXT: [[TMP8:%.]] = bitcast half addrspace(3) [[D:%.]] to <2 x half> addrspace(3)		; GCN-NEXT: [[TMP8:%.]] = bitcast half addrspace(3) [[D:%.]] to <2 x half> addrspace(3)
; GCN-NEXT: store <2 x half> [[TMP7]], <2 x half> addrspace(3)* [[TMP8]], align 2		; GCN-NEXT: store <2 x half> [[TMP7]], <2 x half> addrspace(3)* [[TMP8]], align 2
; GCN-NEXT: ret void		; GCN-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/NVPTX/v2f16.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 \| FileCheck %s
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=nvptx64-nvidia-cuda -mcpu=sm_40 \| FileCheck %s -check-prefix=NOVECTOR			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=nvptx64-nvidia-cuda -mcpu=sm_40 \| FileCheck %s -check-prefix=NOVECTOR

	define void @fusion(i8* noalias nocapture align 256 dereferenceable(19267584) %arg, i8* noalias nocapture readonly align 256 dereferenceable(19267584) %arg1, i32 %arg2, i32 %arg3) local_unnamed_addr #0 {			define void @fusion(i8* noalias nocapture align 256 dereferenceable(19267584) %arg, i8* noalias nocapture readonly align 256 dereferenceable(19267584) %arg1, i32 %arg2, i32 %arg3) local_unnamed_addr #0 {
	; CHECK-LABEL: @fusion(			; CHECK-LABEL: @fusion(
	; CHECK-NEXT: [[TMP:%.]] = shl nuw nsw i32 [[ARG2:%.]], 6			; CHECK-NEXT: [[TMP:%.]] = shl nuw nsw i32 [[ARG2:%.]], 6
	; CHECK-NEXT: [[TMP4:%.]] = or i32 [[TMP]], [[ARG3:%.]]			; CHECK-NEXT: [[TMP4:%.]] = or i32 [[TMP]], [[ARG3:%.]]
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 2			; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 2
	; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[TMP5]] to i64			; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[TMP5]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], 1			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], 1
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i8 [[ARG1:%.]] to half			; CHECK-NEXT: [[TMP10:%.]] = bitcast i8 [[ARG1:%.]] to half
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds half, half [[TMP10]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds half, half [[TMP10]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i8 [[ARG:%.]] to half			; CHECK-NEXT: [[TMP15:%.]] = bitcast i8 [[ARG:%.]] to half
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds half, half [[TMP15]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds half, half [[TMP15]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds half, half [[TMP10]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds half, half [[TMP10]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds half, half [[TMP15]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast half [[TMP11]] to <2 x half>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast half [[TMP11]] to <2 x half>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x half> [[TMP2]], <half 0xH5380, half 0xH5380>			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x half> [[TMP2]], <half 0xH5380, half 0xH5380>
	; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x half> [[TMP3]], <half 0xH57F0, half 0xH57F0>			; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <2 x half> [[TMP3]], <half 0xH57F0, half 0xH57F0>
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds half, half [[TMP15]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast half [[TMP16]] to <2 x half>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast half [[TMP16]] to <2 x half>*
	; CHECK-NEXT: store <2 x half> [[TMP4]], <2 x half>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x half> [[TMP4]], <2 x half>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; NOVECTOR-LABEL: @fusion(			; NOVECTOR-LABEL: @fusion(
	; NOVECTOR-NEXT: [[TMP:%.]] = shl nuw nsw i32 [[ARG2:%.]], 6			; NOVECTOR-NEXT: [[TMP:%.]] = shl nuw nsw i32 [[ARG2:%.]], 6
	; NOVECTOR-NEXT: [[TMP4:%.]] = or i32 [[TMP]], [[ARG3:%.]]			; NOVECTOR-NEXT: [[TMP4:%.]] = or i32 [[TMP]], [[ARG3:%.]]
	; NOVECTOR-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 2			; NOVECTOR-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 2
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s			; RUN: opt -mtriple=systemz-unknown -mcpu=z13 -slp-vectorizer -S < %s \| FileCheck %s

	@bar = external global [4 x [4 x i32]], align 4			@bar = external global [4 x [4 x i32]], align 4
	@dct_luma = external global [4 x [4 x i32]], align 4			@dct_luma = external global [4 x [4 x i32]], align 4

	define void @foo() local_unnamed_addr {			define void @foo() local_unnamed_addr {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD277:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			; CHECK-NEXT: store i32 [[ADD277]], i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0			; CHECK-NEXT: [[ARRAYIDX372:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 0
	; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1			; CHECK-NEXT: [[ARRAYIDX372_1:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 1
	; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2			; CHECK-NEXT: [[ARRAYIDX372_2:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 2
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2) to <2 x i32>*), align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 2) to <2 x i32>*), align 4
				; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ADD277]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = ashr <4 x i32> [[TMP6]], <i32 6, i32 6, i32 6, i32 6>			; CHECK-NEXT: [[TMP7:%.*]] = ashr <4 x i32> [[TMP6]], <i32 6, i32 6, i32 6, i32 6>
	; CHECK-NEXT: [[ARRAYIDX372_3:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] @dct_luma, i64 0, i64 3, i64 3
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX372]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%add277 = add nsw i32 undef, undef			%add277 = add nsw i32 undef, undef
	store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4			store i32 %add277, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 1), align 4
	%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4			%0 = load i32, i32* getelementptr inbounds ([4 x [4 x i32]], [4 x [4 x i32]]* @bar, i64 0, i64 3, i64 0), align 4
	Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR32086.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s

	define void @i64_simplified(i64* noalias %st, i64* noalias %ld) {			define void @i64_simplified(i64* noalias %st, i64* noalias %ld) {
	; CHECK-LABEL: @i64_simplified(			; CHECK-LABEL: @i64_simplified(
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
				; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8			; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1			%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1

	%t0 = load i64, i64* %ld, align 8			%t0 = load i64, i64* %ld, align 8
	%t1 = load i64, i64* %arrayidx1, align 8			%t1 = load i64, i64* %arrayidx1, align 8

	%arrayidx3 = getelementptr inbounds i64, i64* %st, i64 1			%arrayidx3 = getelementptr inbounds i64, i64* %st, i64 1
	%arrayidx4 = getelementptr inbounds i64, i64* %st, i64 2			%arrayidx4 = getelementptr inbounds i64, i64* %st, i64 2
	%arrayidx5 = getelementptr inbounds i64, i64* %st, i64 3			%arrayidx5 = getelementptr inbounds i64, i64* %st, i64 3

	store i64 %t0, i64* %st, align 8			store i64 %t0, i64* %st, align 8
	store i64 %t1, i64* %arrayidx3, align 8			store i64 %t1, i64* %arrayidx3, align 8
	store i64 %t0, i64* %arrayidx4, align 8			store i64 %t0, i64* %arrayidx4, align 8
	store i64 %t1, i64* %arrayidx5, align 8			store i64 %t1, i64* %arrayidx5, align 8
	ret void			ret void
	}			}

	define void @i64_simplifiedi_reversed(i64* noalias %st, i64* noalias %ld) {			define void @i64_simplifiedi_reversed(i64* noalias %st, i64* noalias %ld) {
	; CHECK-LABEL: @i64_simplifiedi_reversed(			; CHECK-LABEL: @i64_simplifiedi_reversed(
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
				; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8			; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1			%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1

	%t0 = load i64, i64* %ld, align 8			%t0 = load i64, i64* %ld, align 8
	%t1 = load i64, i64* %arrayidx1, align 8			%t1 = load i64, i64* %arrayidx1, align 8

	%arrayidx3 = getelementptr inbounds i64, i64* %st, i64 1			%arrayidx3 = getelementptr inbounds i64, i64* %st, i64 1
	%arrayidx4 = getelementptr inbounds i64, i64* %st, i64 2			%arrayidx4 = getelementptr inbounds i64, i64* %st, i64 2
	%arrayidx5 = getelementptr inbounds i64, i64* %st, i64 3			%arrayidx5 = getelementptr inbounds i64, i64* %st, i64 3

	store i64 %t1, i64* %st, align 8			store i64 %t1, i64* %st, align 8
	store i64 %t0, i64* %arrayidx3, align 8			store i64 %t0, i64* %arrayidx3, align 8
	store i64 %t1, i64* %arrayidx4, align 8			store i64 %t1, i64* %arrayidx4, align 8
	store i64 %t0, i64* %arrayidx5, align 8			store i64 %t0, i64* %arrayidx5, align 8
	ret void			ret void
	}			}

	define void @i64_simplifiedi_extract(i64* noalias %st, i64* noalias %ld) {			define void @i64_simplifiedi_extract(i64* noalias %st, i64* noalias %ld) {
	; CHECK-LABEL: @i64_simplifiedi_extract(			; CHECK-LABEL: @i64_simplifiedi_extract(
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[LD:%.*]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[ST:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 2
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i64, i64 [[ST]], i64 3
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[LD]] to <2 x i64>*
				; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[ST]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8			; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[SHUFFLE]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[SHUFFLE]], i32 3
	; CHECK-NEXT: store i64 [[TMP4]], i64* [[LD]], align 8			; CHECK-NEXT: store i64 [[TMP4]], i64* [[LD]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1			%arrayidx1 = getelementptr inbounds i64, i64* %ld, i64 1

	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555
				; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
				; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
				; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
	Show All 15 Lines
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]			; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>			; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/addsub.ll

	Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines

	; Check vectorization of following code for double data type-			; Check vectorization of following code for double data type-
	; c[0] = (a[0]+b[0])-d[0];			; c[0] = (a[0]+b[0])-d[0];
	; c[1] = d[1]+(a[1]+b[1]); //swapped d[1] and (a[1]+b[1])			; c[1] = d[1]+(a[1]+b[1]); //swapped d[1] and (a[1]+b[1])

	define void @reorder_alt_rightsubTree(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b, double* noalias nocapture readonly %d) {			define void @reorder_alt_rightsubTree(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b, double* noalias nocapture readonly %d) {
	; CHECK-LABEL: @reorder_alt_rightsubTree(			; CHECK-LABEL: @reorder_alt_rightsubTree(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[D]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[A]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[D]] to <2 x double>*
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[A]] to <2 x double>*
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[B]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> [[TMP7]], align 8
	; CHECK-NEXT: [[TMP9:%.]] = load <2 x double>, <2 x double> [[TMP8]], align 8			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[B]] to <2 x double>*
	; CHECK-NEXT: [[TMP10:%.*]] = fadd <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> [[TMP9]], align 8
	; CHECK-NEXT: [[TMP11:%.*]] = fsub <2 x double> [[TMP10]], [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP10]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = fsub <2 x double> [[TMP11]], [[TMP6]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP11]], [[TMP6]]
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP12]], <2 x double> [[TMP13]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[C]] to <2 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[C]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP15]], align 8			; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load double, double* %a			%1 = load double, double* %a
	%2 = load double, double* %b			%2 = load double, double* %b
	%3 = fadd double %1, %2			%3 = fadd double %1, %2
	%4 = load double, double* %d			%4 = load double, double* %d
	%5 = fsub double %3, %4			%5 = fsub double %3, %4
	store double %5, double* %c			store double %5, double* %c
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/align.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; Simple 3-pair chain with loads and stores			; Simple 3-pair chain with loads and stores
	define void @test1(double* %a, double* %b, double* %c) {			define void @test1(double* %a, double* %b, double* %c) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[AGG_TMP_I_I_SROA_0:%.*]] = alloca [3 x double], align 16			; CHECK-NEXT: [[AGG_TMP_I_I_SROA_0:%.*]] = alloca [3 x double], align 16
	; CHECK-NEXT: [[STORE1:%.]] = getelementptr inbounds [3 x double], [3 x double] [[AGG_TMP_I_I_SROA_0]], i64 0, i64 1			; CHECK-NEXT: [[STORE1:%.]] = getelementptr inbounds [3 x double], [3 x double] [[AGG_TMP_I_I_SROA_0]], i64 0, i64 1
	; CHECK-NEXT: [[STORE2:%.]] = getelementptr inbounds [3 x double], [3 x double] [[AGG_TMP_I_I_SROA_0]], i64 0, i64 2			; CHECK-NEXT: [[STORE2:%.]] = getelementptr inbounds [3 x double], [3 x double] [[AGG_TMP_I_I_SROA_0]], i64 0, i64 2
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[STORE1]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[STORE1]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 17 Lines
	; value being loaded/stored not the alignment of the pointer type.			; value being loaded/stored not the alignment of the pointer type.

	define void @test2(float * %a, float * %b) {			define void @test2(float * %a, float * %b) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A1:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 1			; CHECK-NEXT: [[A1:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 1
	; CHECK-NEXT: [[A2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[A2:%.]] = getelementptr inbounds float, float [[A]], i64 2
	; CHECK-NEXT: [[A3:%.]] = getelementptr inbounds float, float [[A]], i64 3			; CHECK-NEXT: [[A3:%.]] = getelementptr inbounds float, float [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A]] to <4 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[B1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[B1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[B2:%.]] = getelementptr inbounds float, float [[B]], i64 2			; CHECK-NEXT: [[B2:%.]] = getelementptr inbounds float, float [[B]], i64 2
	; CHECK-NEXT: [[B3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[B3:%.]] = getelementptr inbounds float, float [[B]], i64 3
				; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A]] to <4 x float>*
				; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[B]] to <4 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[B]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%l0 = load float, float* %a			%l0 = load float, float* %a
	%a1 = getelementptr inbounds float, float* %a, i64 1			%a1 = getelementptr inbounds float, float* %a, i64 1
	%l1 = load float, float* %a1			%l1 = load float, float* %a1
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-abs.ll

Show All 22 Lines
declare i64 @llvm.abs.i64(i64, i1)		declare i64 @llvm.abs.i64(i64, i1)
declare i32 @llvm.abs.i32(i32, i1)		declare i32 @llvm.abs.i32(i32, i1)
declare i16 @llvm.abs.i16(i16, i1)		declare i16 @llvm.abs.i16(i16, i1)
declare i8 @llvm.abs.i8 (i8, i1)		declare i8 @llvm.abs.i8 (i8, i1)

define void @abs_v8i64() {		define void @abs_v8i64() {
; SSE-LABEL: @abs_v8i64(		; SSE-LABEL: @abs_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP1]], i1 false)		; SSE-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP3]], i1 false)
; SSE-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP2]], i1 false)		; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP3]], i1 false)		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP4]], i1 false)		; SSE-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP5]], i1 false)
; SSE-NEXT: store <2 x i64> [[TMP5]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP7]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP8:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP7]], i1 false)
; SSE-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @abs_v8i64(		; SLM-LABEL: @abs_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP1]], i1 false)
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP1]], i1 false)		; SLM-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP3]], i1 false)
; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP2]], i1 false)		; SLM-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP3]], i1 false)		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP8:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP4]], i1 false)		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP5]], i1 false)
; SLM-NEXT: store <2 x i64> [[TMP5]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP7]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP8:%.*]] = call <2 x i64> @llvm.abs.v2i64(<2 x i64> [[TMP7]], i1 false)
; SLM-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @abs_v8i64(		; AVX-LABEL: @abs_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.abs.v4i64(<4 x i64> [[TMP1]], i1 false)
; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.abs.v4i64(<4 x i64> [[TMP1]], i1 false)		; AVX-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP4:%.*]] = call <4 x i64> @llvm.abs.v4i64(<4 x i64> [[TMP2]], i1 false)		; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP4:%.*]] = call <4 x i64> @llvm.abs.v4i64(<4 x i64> [[TMP3]], i1 false)
; AVX-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @abs_v8i64(		; AVX512-LABEL: @abs_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.*]] = call <8 x i64> @llvm.abs.v8i64(<8 x i64> [[TMP1]], i1 false)		; AVX512-NEXT: [[TMP2:%.*]] = call <8 x i64> @llvm.abs.v8i64(<8 x i64> [[TMP1]], i1 false)
; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 23 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @abs_v16i32() {		define void @abs_v16i32() {
; SSE-LABEL: @abs_v16i32(		; SSE-LABEL: @abs_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP1]], i1 false)		; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP3]], i1 false)
; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP2]], i1 false)		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP3]], i1 false)		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP4]], i1 false)		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP5]], i1 false)
; SSE-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP7]], i1 false)
; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @abs_v16i32(		; SLM-LABEL: @abs_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP1]], i1 false)
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP1]], i1 false)		; SLM-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP3]], i1 false)
; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP2]], i1 false)		; SLM-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP3]], i1 false)		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP8:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP4]], i1 false)		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP5]], i1 false)
; SLM-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP8:%.*]] = call <4 x i32> @llvm.abs.v4i32(<4 x i32> [[TMP7]], i1 false)
; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @abs_v16i32(		; AVX-LABEL: @abs_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.abs.v8i32(<8 x i32> [[TMP1]], i1 false)
; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.abs.v8i32(<8 x i32> [[TMP1]], i1 false)		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = call <8 x i32> @llvm.abs.v8i32(<8 x i32> [[TMP2]], i1 false)		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.*]] = call <8 x i32> @llvm.abs.v8i32(<8 x i32> [[TMP3]], i1 false)
; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @abs_v16i32(		; AVX512-LABEL: @abs_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = call <16 x i32> @llvm.abs.v16i32(<16 x i32> [[TMP1]], i1 false)		; AVX512-NEXT: [[TMP2:%.*]] = call <16 x i32> @llvm.abs.v16i32(<16 x i32> [[TMP1]], i1 false)
; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @abs_v32i16() {		define void @abs_v32i16() {
; SSE-LABEL: @abs_v32i16(		; SSE-LABEL: @abs_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP1]], i1 false)		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP3]], i1 false)
; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP2]], i1 false)		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP3]], i1 false)		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP8:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP4]], i1 false)		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP5]], i1 false)
; SSE-NEXT: store <8 x i16> [[TMP5]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP7]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP8:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP7]], i1 false)
; SSE-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @abs_v32i16(		; SLM-LABEL: @abs_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP1]], i1 false)
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP1]], i1 false)		; SLM-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP3]], i1 false)
; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP2]], i1 false)		; SLM-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP3]], i1 false)		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP8:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP4]], i1 false)		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP5]], i1 false)
; SLM-NEXT: store <8 x i16> [[TMP5]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP7]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP8:%.*]] = call <8 x i16> @llvm.abs.v8i16(<8 x i16> [[TMP7]], i1 false)
; SLM-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @abs_v32i16(		; AVX-LABEL: @abs_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.abs.v16i16(<16 x i16> [[TMP1]], i1 false)
; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.abs.v16i16(<16 x i16> [[TMP1]], i1 false)		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP4:%.*]] = call <16 x i16> @llvm.abs.v16i16(<16 x i16> [[TMP2]], i1 false)		; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP4:%.*]] = call <16 x i16> @llvm.abs.v16i16(<16 x i16> [[TMP3]], i1 false)
; AVX-NEXT: store <16 x i16> [[TMP4]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP4]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @abs_v32i16(		; AVX512-LABEL: @abs_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.*]] = call <32 x i16> @llvm.abs.v32i16(<32 x i16> [[TMP1]], i1 false)		; AVX512-NEXT: [[TMP2:%.*]] = call <32 x i16> @llvm.abs.v32i16(<32 x i16> [[TMP1]], i1 false)
; AVX512-NEXT: store <32 x i16> [[TMP2]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP2]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @abs_v64i8() {		define void @abs_v64i8() {
; SSE-LABEL: @abs_v64i8(		; SSE-LABEL: @abs_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP1]], i1 false)		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP3]], i1 false)
; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP2]], i1 false)		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP3]], i1 false)		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP4]], i1 false)		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP5]], i1 false)
; SSE-NEXT: store <16 x i8> [[TMP5]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP7]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP7]], i1 false)
; SSE-NEXT: store <16 x i8> [[TMP8]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP8]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @abs_v64i8(		; SLM-LABEL: @abs_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP1]], i1 false)
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP1]], i1 false)		; SLM-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP3]], i1 false)
; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP2]], i1 false)		; SLM-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP3]], i1 false)		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP4]], i1 false)		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP5]], i1 false)
; SLM-NEXT: store <16 x i8> [[TMP5]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP7]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP8:%.*]] = call <16 x i8> @llvm.abs.v16i8(<16 x i8> [[TMP7]], i1 false)
; SLM-NEXT: store <16 x i8> [[TMP8]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP8]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @abs_v64i8(		; AVX-LABEL: @abs_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> [[TMP1]], i1 false)
; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> [[TMP1]], i1 false)		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP4:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> [[TMP2]], i1 false)		; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP4:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> [[TMP3]], i1 false)
; AVX-NEXT: store <32 x i8> [[TMP4]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP4]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @abs_v64i8(		; AVX512-LABEL: @abs_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.*]] = call <64 x i8> @llvm.abs.v64i8(<64 x i8> [[TMP1]], i1 false)		; AVX512-NEXT: [[TMP2:%.*]] = call <64 x i8> @llvm.abs.v64i8(<64 x i8> [[TMP1]], i1 false)
; AVX512-NEXT: store <64 x i8> [[TMP2]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP2]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v8i64(		; AVX-LABEL: @add_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v8i64(		; AVX512-LABEL: @add_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; AVX256BW-NEXT: ret void
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @add_v16i32() {		define void @add_v16i32() {
; SSE-LABEL: @add_v16i32(		; SSE-LABEL: @add_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v16i32(		; SLM-LABEL: @add_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v16i32(		; AVX-LABEL: @add_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v16i32(		; AVX512-LABEL: @add_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @add_v32i16() {		define void @add_v32i16() {
; SSE-LABEL: @add_v32i16(		; SSE-LABEL: @add_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v32i16(		; SLM-LABEL: @add_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v32i16(		; AVX-LABEL: @add_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v32i16(		; AVX512-LABEL: @add_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @add_v64i8() {		define void @add_v64i8() {
; SSE-LABEL: @add_v64i8(		; SSE-LABEL: @add_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v64i8(		; SLM-LABEL: @add_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v64i8(		; AVX-LABEL: @add_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v64i8(		; AVX512-LABEL: @add_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add_v8i64(		; AVX-LABEL: @add_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v8i64(		; AVX512-LABEL: @add_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @add_v16i32() {		define void @add_v16i32() {
; SSE-LABEL: @add_v16i32(		; SSE-LABEL: @add_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add_v16i32(		; AVX-LABEL: @add_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.uadd.sat.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v16i32(		; AVX512-LABEL: @add_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.uadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.uadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @add_v32i16() {		define void @add_v32i16() {
; SSE-LABEL: @add_v32i16(		; SSE-LABEL: @add_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.uadd.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add_v32i16(		; AVX-LABEL: @add_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.uadd.sat.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v32i16(		; AVX512-LABEL: @add_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.uadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.uadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @add_v64i8() {		define void @add_v64i8() {
; SSE-LABEL: @add_v64i8(		; SSE-LABEL: @add_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.uadd.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @add_v64i8(		; AVX-LABEL: @add_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.uadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.uadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.uadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.uadd.sat.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v64i8(		; AVX512-LABEL: @add_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.uadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.uadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-add.ll

Show All 19 Lines
@c16 = common global [32 x i16] zeroinitializer, align 64		@c16 = common global [32 x i16] zeroinitializer, align 64
@a8 = common global [64 x i8] zeroinitializer, align 64		@a8 = common global [64 x i8] zeroinitializer, align 64
@b8 = common global [64 x i8] zeroinitializer, align 64		@b8 = common global [64 x i8] zeroinitializer, align 64
@c8 = common global [64 x i8] zeroinitializer, align 64		@c8 = common global [64 x i8] zeroinitializer, align 64

define void @add_v8i64() {		define void @add_v8i64() {
; SSE-LABEL: @add_v8i64(		; SSE-LABEL: @add_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = add <2 x i64> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = add <2 x i64> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = add <2 x i64> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = add <2 x i64> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v8i64(		; SLM-LABEL: @add_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = add <2 x i64> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = add <2 x i64> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = add <2 x i64> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = add <2 x i64> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v8i64(		; AVX-LABEL: @add_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = add <4 x i64> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = add <4 x i64> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v8i64(		; AVX512-LABEL: @add_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = add <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = add <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @add_v16i32() {		define void @add_v16i32() {
; SSE-LABEL: @add_v16i32(		; SSE-LABEL: @add_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = add <4 x i32> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v16i32(		; SLM-LABEL: @add_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = add <4 x i32> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v16i32(		; AVX-LABEL: @add_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = add <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = add <8 x i32> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = add <8 x i32> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v16i32(		; AVX512-LABEL: @add_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = add <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = add <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @add_v32i16() {		define void @add_v32i16() {
; SSE-LABEL: @add_v32i16(		; SSE-LABEL: @add_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = add <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = add <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = add <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = add <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = add <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = add <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v32i16(		; SLM-LABEL: @add_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = add <8 x i16> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = add <8 x i16> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = add <8 x i16> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = add <8 x i16> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = add <8 x i16> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = add <8 x i16> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v32i16(		; AVX-LABEL: @add_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = add <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = add <16 x i16> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = add <16 x i16> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v32i16(		; AVX512-LABEL: @add_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = add <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = add <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @add_v64i8() {		define void @add_v64i8() {
; SSE-LABEL: @add_v64i8(		; SSE-LABEL: @add_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = add <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = add <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = add <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = add <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = add <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = add <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = add <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = add <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @add_v64i8(		; SLM-LABEL: @add_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = add <16 x i8> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = add <16 x i8> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = add <16 x i8> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = add <16 x i8> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = add <16 x i8> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = add <16 x i8> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = add <16 x i8> [[TMP4]], [[TMP8]]		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = add <16 x i8> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @add_v64i8(		; AVX-LABEL: @add_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = add <32 x i8> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = add <32 x i8> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = add <32 x i8> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = add <32 x i8> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @add_v64i8(		; AVX512-LABEL: @add_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = add <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = add <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-div.ll

Show All 19 Lines
@c16 = common global [32 x i16] zeroinitializer, align 64		@c16 = common global [32 x i16] zeroinitializer, align 64
@a8 = common global [64 x i8] zeroinitializer, align 64		@a8 = common global [64 x i8] zeroinitializer, align 64
@b8 = common global [64 x i8] zeroinitializer, align 64		@b8 = common global [64 x i8] zeroinitializer, align 64
@c8 = common global [64 x i8] zeroinitializer, align 64		@c8 = common global [64 x i8] zeroinitializer, align 64

define void @sdiv_v16i32_uniformconst() {		define void @sdiv_v16i32_uniformconst() {
; SSE-LABEL: @sdiv_v16i32_uniformconst(		; SSE-LABEL: @sdiv_v16i32_uniformconst(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP4:%.*]] = sdiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP6:%.*]] = sdiv <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = sdiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP6:%.*]] = sdiv <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = sdiv <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sdiv_v16i32_uniformconst(		; SLM-LABEL: @sdiv_v16i32_uniformconst(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP4:%.*]] = sdiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP6:%.*]] = sdiv <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.*]] = sdiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP8:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP6:%.*]] = sdiv <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP8:%.*]] = sdiv <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sdiv_v16i32_uniformconst(		; AVX-LABEL: @sdiv_v16i32_uniformconst(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.*]] = sdiv <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = sdiv <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = sdiv <8 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.*]] = sdiv <8 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sdiv_v16i32_uniformconst(		; AVX512-LABEL: @sdiv_v16i32_uniformconst(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = sdiv <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX512-NEXT: [[TMP2:%.*]] = sdiv <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @srem_v16i32_uniformconst() {		define void @srem_v16i32_uniformconst() {
; SSE-LABEL: @srem_v16i32_uniformconst(		; SSE-LABEL: @srem_v16i32_uniformconst(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = srem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = srem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP4:%.*]] = srem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP6:%.*]] = srem <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = srem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = srem <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP6:%.*]] = srem <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = srem <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @srem_v16i32_uniformconst(		; SLM-LABEL: @srem_v16i32_uniformconst(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.*]] = srem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.*]] = srem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP4:%.*]] = srem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP6:%.*]] = srem <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.*]] = srem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP8:%.*]] = srem <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP6:%.*]] = srem <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP8:%.*]] = srem <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @srem_v16i32_uniformconst(		; AVX-LABEL: @srem_v16i32_uniformconst(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.*]] = srem <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = srem <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = srem <8 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.*]] = srem <8 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @srem_v16i32_uniformconst(		; AVX512-LABEL: @srem_v16i32_uniformconst(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = srem <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX512-NEXT: [[TMP2:%.*]] = srem <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @udiv_v16i32_uniformconst() {		define void @udiv_v16i32_uniformconst() {
; SSE-LABEL: @udiv_v16i32_uniformconst(		; SSE-LABEL: @udiv_v16i32_uniformconst(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = udiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = udiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP4:%.*]] = udiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP6:%.*]] = udiv <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = udiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = udiv <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP6:%.*]] = udiv <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = udiv <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @udiv_v16i32_uniformconst(		; SLM-LABEL: @udiv_v16i32_uniformconst(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.*]] = udiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.*]] = udiv <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP4:%.*]] = udiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP6:%.*]] = udiv <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.*]] = udiv <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP8:%.*]] = udiv <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP6:%.*]] = udiv <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP8:%.*]] = udiv <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @udiv_v16i32_uniformconst(		; AVX-LABEL: @udiv_v16i32_uniformconst(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.*]] = udiv <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = udiv <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = udiv <8 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.*]] = udiv <8 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @udiv_v16i32_uniformconst(		; AVX512-LABEL: @udiv_v16i32_uniformconst(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = udiv <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX512-NEXT: [[TMP2:%.*]] = udiv <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @urem_v16i32_uniformconst() {		define void @urem_v16i32_uniformconst() {
; SSE-LABEL: @urem_v16i32_uniformconst(		; SSE-LABEL: @urem_v16i32_uniformconst(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = urem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = urem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP4:%.*]] = urem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = urem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = urem <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SSE-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = urem <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @urem_v16i32_uniformconst(		; SLM-LABEL: @urem_v16i32_uniformconst(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.*]] = urem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.*]] = urem <4 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP4:%.*]] = urem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.*]] = urem <4 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP8:%.*]] = urem <4 x i32> [[TMP4]], <i32 5, i32 5, i32 5, i32 5>		; SLM-NEXT: [[TMP6:%.*]] = urem <4 x i32> [[TMP5]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP8:%.*]] = urem <4 x i32> [[TMP7]], <i32 5, i32 5, i32 5, i32 5>
; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @urem_v16i32_uniformconst(		; AVX-LABEL: @urem_v16i32_uniformconst(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.*]] = urem <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = urem <8 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = urem <8 x i32> [[TMP2]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP4:%.*]] = urem <8 x i32> [[TMP3]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP4]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @urem_v16i32_uniformconst(		; AVX512-LABEL: @urem_v16i32_uniformconst(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = urem <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; AVX512-NEXT: [[TMP2:%.*]] = urem <16 x i32> [[TMP1]], <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP2]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-fix.ll

Show All 22 Lines
declare i64 @llvm.smul.fix.i64(i64, i64, i32)		declare i64 @llvm.smul.fix.i64(i64, i64, i32)
declare i32 @llvm.smul.fix.i32(i32, i32, i32)		declare i32 @llvm.smul.fix.i32(i32, i32, i32)
declare i16 @llvm.smul.fix.i16(i16, i16, i32)		declare i16 @llvm.smul.fix.i16(i16, i16, i32)
declare i8 @llvm.smul.fix.i8 (i8 , i8 , i32)		declare i8 @llvm.smul.fix.i8 (i8 , i8 , i32)

define void @smul_v8i64() {		define void @smul_v8i64() {
; SSE-LABEL: @smul_v8i64(		; SSE-LABEL: @smul_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smul_v8i64(		; SLM-LABEL: @smul_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX1-LABEL: @smul_v8i64(		; AVX1-LABEL: @smul_v8i64(
; AVX1-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; AVX1-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; AVX1-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; AVX1-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; AVX1-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; AVX1-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; AVX1-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; AVX1-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; AVX1-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @smul_v8i64(		; AVX2-LABEL: @smul_v8i64(
; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]], i32 3)
; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]], i32 3)		; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]], i32 3)		; AVX2-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]], i32 3)
; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @smul_v8i64(		; AVX512-LABEL: @smul_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smul.fix.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smul.fix.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]], i32 3)
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256BW-LABEL: @smul_v8i64(		; AVX256BW-LABEL: @smul_v8i64(
; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]], i32 3)
; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]], i32 3)		; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]], i32 3)		; AVX256BW-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smul.fix.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]], i32 3)
; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: ret void		; AVX256BW-NEXT: ret void
;		;
%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8		%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8		%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8		%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8		%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8		%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
Show All 25 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @smul_v16i32() {		define void @smul_v16i32() {
; SSE-LABEL: @smul_v16i32(		; SSE-LABEL: @smul_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]], i32 3)		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i32 3)
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smul_v16i32(		; SLM-LABEL: @smul_v16i32(
; SLM-NEXT: [[A0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4		; SLM-NEXT: [[A0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4
; SLM-NEXT: [[A1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4		; SLM-NEXT: [[A1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4
; SLM-NEXT: [[A2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4		; SLM-NEXT: [[A2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4
; SLM-NEXT: [[A3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4		; SLM-NEXT: [[A3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; SLM-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4		; SLM-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4
; SLM-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4		; SLM-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
; SLM-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		; SLM-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
; SLM-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		; SLM-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smul_v16i32(		; AVX-LABEL: @smul_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.smul.fix.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.smul.fix.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smul.fix.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smul.fix.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]], i32 3)
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smul_v16i32(		; AVX512-LABEL: @smul_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smul.fix.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smul.fix.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]], i32 3)
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @smul_v32i16() {		define void @smul_v32i16() {
; SSE-LABEL: @smul_v32i16(		; SSE-LABEL: @smul_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]], i32 3)		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i32 3)
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smul_v32i16(		; SLM-LABEL: @smul_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]], i32 3)		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smul.fix.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i32 3)
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smul_v32i16(		; AVX-LABEL: @smul_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.smul.fix.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.smul.fix.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smul.fix.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smul.fix.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]], i32 3)
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smul_v32i16(		; AVX512-LABEL: @smul_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smul.fix.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smul.fix.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]], i32 3)
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @smul_v64i8() {		define void @smul_v64i8() {
; SSE-LABEL: @smul_v64i8(		; SSE-LABEL: @smul_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]], i32 3)		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]], i32 3)
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smul_v64i8(		; SLM-LABEL: @smul_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]], i32 3)		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smul.fix.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]], i32 3)
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smul_v64i8(		; AVX-LABEL: @smul_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.smul.fix.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.smul.fix.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smul.fix.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smul.fix.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]], i32 3)
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smul_v64i8(		; AVX512-LABEL: @smul_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smul.fix.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smul.fix.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]], i32 3)
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
declare i64 @llvm.umul.fix.i64(i64, i64, i32)		declare i64 @llvm.umul.fix.i64(i64, i64, i32)
declare i32 @llvm.umul.fix.i32(i32, i32, i32)		declare i32 @llvm.umul.fix.i32(i32, i32, i32)
declare i16 @llvm.umul.fix.i16(i16, i16, i32)		declare i16 @llvm.umul.fix.i16(i16, i16, i32)
declare i8 @llvm.umul.fix.i8 (i8 , i8 , i32)		declare i8 @llvm.umul.fix.i8 (i8 , i8 , i32)

define void @umul_v8i64() {		define void @umul_v8i64() {
; SSE-LABEL: @umul_v8i64(		; SSE-LABEL: @umul_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umul_v8i64(		; SLM-LABEL: @umul_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX1-LABEL: @umul_v8i64(		; AVX1-LABEL: @umul_v8i64(
; AVX1-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]], i32 3)
; AVX1-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]], i32 3)
; AVX1-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]], i32 3)		; AVX1-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]], i32 3)		; AVX1-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]], i32 3)		; AVX1-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]], i32 3)
; AVX1-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]], i32 3)		; AVX1-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX1-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umul.fix.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]], i32 3)
; AVX1-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX1-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @umul_v8i64(		; AVX2-LABEL: @umul_v8i64(
; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]], i32 3)
; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]], i32 3)		; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]], i32 3)		; AVX2-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]], i32 3)
; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @umul_v8i64(		; AVX512-LABEL: @umul_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umul.fix.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umul.fix.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]], i32 3)
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256BW-LABEL: @umul_v8i64(		; AVX256BW-LABEL: @umul_v8i64(
; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]], i32 3)
; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]], i32 3)		; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]], i32 3)		; AVX256BW-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umul.fix.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]], i32 3)
; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256BW-NEXT: ret void		; AVX256BW-NEXT: ret void
;		;
%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8		%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8		%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8		%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8		%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8		%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
Show All 25 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @umul_v16i32() {		define void @umul_v16i32() {
; SSE-LABEL: @umul_v16i32(		; SSE-LABEL: @umul_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]], i32 3)		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i32 3)
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umul_v16i32(		; SLM-LABEL: @umul_v16i32(
; SLM-NEXT: [[A0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4		; SLM-NEXT: [[A0:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4
; SLM-NEXT: [[A1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4		; SLM-NEXT: [[A1:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4
; SLM-NEXT: [[A2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4		; SLM-NEXT: [[A2:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4
; SLM-NEXT: [[A3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4		; SLM-NEXT: [[A3:%.]] = load i32, i32 getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; SLM-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4		; SLM-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4
; SLM-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4		; SLM-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
; SLM-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		; SLM-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
; SLM-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		; SLM-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umul_v16i32(		; AVX-LABEL: @umul_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.umul.fix.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.umul.fix.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umul.fix.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umul.fix.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]], i32 3)
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umul_v16i32(		; AVX512-LABEL: @umul_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umul.fix.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umul.fix.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]], i32 3)
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @umul_v32i16() {		define void @umul_v32i16() {
; SSE-LABEL: @umul_v32i16(		; SSE-LABEL: @umul_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]], i32 3)		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i32 3)
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umul_v32i16(		; SLM-LABEL: @umul_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]], i32 3)		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umul.fix.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i32 3)
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umul_v32i16(		; AVX-LABEL: @umul_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.umul.fix.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.umul.fix.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umul.fix.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umul.fix.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]], i32 3)
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umul_v32i16(		; AVX512-LABEL: @umul_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umul.fix.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umul.fix.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]], i32 3)
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @umul_v64i8() {		define void @umul_v64i8() {
; SSE-LABEL: @umul_v64i8(		; SSE-LABEL: @umul_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], i32 3)
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i32 3)
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]], i32 3)		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]], i32 3)		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]], i32 3)		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]], i32 3)
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]], i32 3)		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]], i32 3)
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umul_v64i8(		; SLM-LABEL: @umul_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]], i32 3)
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]], i32 3)
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]], i32 3)		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]], i32 3)		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]], i32 3)		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]], i32 3)
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]], i32 3)		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umul.fix.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]], i32 3)
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umul_v64i8(		; AVX-LABEL: @umul_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.umul.fix.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]], i32 3)
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.umul.fix.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]], i32 3)		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umul.fix.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]], i32 3)		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umul.fix.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]], i32 3)
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umul_v64i8(		; AVX512-LABEL: @umul_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umul.fix.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]], i32 3)		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umul.fix.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]], i32 3)
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX128-LABEL: @mul_v8i64(		; AVX128-LABEL: @mul_v8i64(
; AVX128-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP3:%.*]] = mul <2 x i64> [[TMP1]], [[TMP2]]
; AVX128-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX128-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP6:%.*]] = mul <2 x i64> [[TMP4]], [[TMP5]]
; AVX128-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX128-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP9:%.*]] = mul <2 x i64> [[TMP1]], [[TMP5]]		; AVX128-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP10:%.*]] = mul <2 x i64> [[TMP2]], [[TMP6]]		; AVX128-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX128-NEXT: [[TMP11:%.*]] = mul <2 x i64> [[TMP3]], [[TMP7]]		; AVX128-NEXT: [[TMP9:%.*]] = mul <2 x i64> [[TMP7]], [[TMP8]]
; AVX128-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[TMP4]], [[TMP8]]		; AVX128-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; AVX128-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX128-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX128-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; AVX128-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[TMP10]], [[TMP11]]
; AVX128-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; AVX128-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; AVX128-NEXT: ret void		; AVX128-NEXT: ret void
;		;
; AVX256-LABEL: @mul_v8i64(		; AVX256-LABEL: @mul_v8i64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX256-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX256-NEXT: [[TMP3:%.*]] = mul <4 x i64> [[TMP1]], [[TMP2]]
; AVX256-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX256-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP1]], [[TMP3]]		; AVX256-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256-NEXT: [[TMP6:%.*]] = mul <4 x i64> [[TMP2]], [[TMP4]]		; AVX256-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX256-NEXT: [[TMP6:%.*]] = mul <4 x i64> [[TMP4]], [[TMP5]]
; AVX256-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @mul_v8i64(		; AVX512-LABEL: @mul_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = mul <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = mul <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @mul_v16i32() {		define void @mul_v16i32() {
; SSE-LABEL: @mul_v16i32(		; SSE-LABEL: @mul_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @mul_v16i32(		; SLM-LABEL: @mul_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX128-LABEL: @mul_v16i32(		; AVX128-LABEL: @mul_v16i32(
; AVX128-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[TMP2]]
; AVX128-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; AVX128-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[TMP4]], [[TMP5]]
; AVX128-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; AVX128-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP1]], [[TMP5]]		; AVX128-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP2]], [[TMP6]]		; AVX128-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; AVX128-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP3]], [[TMP7]]		; AVX128-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP7]], [[TMP8]]
; AVX128-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP4]], [[TMP8]]		; AVX128-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; AVX128-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; AVX128-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; AVX128-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; AVX128-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP10]], [[TMP11]]
; AVX128-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; AVX128-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; AVX128-NEXT: ret void		; AVX128-NEXT: ret void
;		;
; AVX256-LABEL: @mul_v16i32(		; AVX256-LABEL: @mul_v16i32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = mul <8 x i32> [[TMP1]], [[TMP2]]
; AVX256-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX256-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]		; AVX256-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = mul <8 x i32> [[TMP2]], [[TMP4]]		; AVX256-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX256-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = mul <8 x i32> [[TMP4]], [[TMP5]]
; AVX256-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX256-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @mul_v16i32(		; AVX512-LABEL: @mul_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = mul <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = mul <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @mul_v32i16() {		define void @mul_v32i16() {
; SSE-LABEL: @mul_v32i16(		; SSE-LABEL: @mul_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = mul <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = mul <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = mul <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = mul <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @mul_v32i16(		; SLM-LABEL: @mul_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = mul <8 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = mul <8 x i16> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = mul <8 x i16> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = mul <8 x i16> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX128-LABEL: @mul_v32i16(		; AVX128-LABEL: @mul_v32i16(
; AVX128-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP3:%.*]] = mul <8 x i16> [[TMP1]], [[TMP2]]
; AVX128-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; AVX128-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP6:%.*]] = mul <8 x i16> [[TMP4]], [[TMP5]]
; AVX128-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; AVX128-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP1]], [[TMP5]]		; AVX128-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP10:%.*]] = mul <8 x i16> [[TMP2]], [[TMP6]]		; AVX128-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; AVX128-NEXT: [[TMP11:%.*]] = mul <8 x i16> [[TMP3]], [[TMP7]]		; AVX128-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP7]], [[TMP8]]
; AVX128-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP4]], [[TMP8]]		; AVX128-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; AVX128-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; AVX128-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; AVX128-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; AVX128-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP10]], [[TMP11]]
; AVX128-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; AVX128-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; AVX128-NEXT: ret void		; AVX128-NEXT: ret void
;		;
; AVX256-LABEL: @mul_v32i16(		; AVX256-LABEL: @mul_v32i16(
; AVX256-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX256-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX256-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX256-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX256-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX256-NEXT: [[TMP3:%.*]] = mul <16 x i16> [[TMP1]], [[TMP2]]
; AVX256-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX256-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX256-NEXT: [[TMP5:%.*]] = mul <16 x i16> [[TMP1]], [[TMP3]]		; AVX256-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX256-NEXT: [[TMP6:%.*]] = mul <16 x i16> [[TMP2]], [[TMP4]]		; AVX256-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX256-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX256-NEXT: [[TMP6:%.*]] = mul <16 x i16> [[TMP4]], [[TMP5]]
; AVX256-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX256-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @mul_v32i16(		; AVX512-LABEL: @mul_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = mul <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = mul <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @mul_v64i8() {		define void @mul_v64i8() {
; SSE-LABEL: @mul_v64i8(		; SSE-LABEL: @mul_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = mul <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = mul <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = mul <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @mul_v64i8(		; SLM-LABEL: @mul_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = mul <16 x i8> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = mul <16 x i8> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = mul <16 x i8> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP4]], [[TMP8]]		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX128-LABEL: @mul_v64i8(		; AVX128-LABEL: @mul_v64i8(
; AVX128-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP3:%.*]] = mul <16 x i8> [[TMP1]], [[TMP2]]
; AVX128-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; AVX128-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP6:%.*]] = mul <16 x i8> [[TMP4]], [[TMP5]]
; AVX128-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; AVX128-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP1]], [[TMP5]]		; AVX128-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP10:%.*]] = mul <16 x i8> [[TMP2]], [[TMP6]]		; AVX128-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; AVX128-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP3]], [[TMP7]]		; AVX128-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP7]], [[TMP8]]
; AVX128-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP4]], [[TMP8]]		; AVX128-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; AVX128-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; AVX128-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; AVX128-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP10]], [[TMP11]]
; AVX128-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; AVX128-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; AVX128-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; AVX128-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; AVX128-NEXT: ret void		; AVX128-NEXT: ret void
;		;
; AVX256-LABEL: @mul_v64i8(		; AVX256-LABEL: @mul_v64i8(
; AVX256-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX256-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX256-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX256-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX256-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX256-NEXT: [[TMP3:%.*]] = mul <32 x i8> [[TMP1]], [[TMP2]]
; AVX256-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX256-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX256-NEXT: [[TMP5:%.*]] = mul <32 x i8> [[TMP1]], [[TMP3]]		; AVX256-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX256-NEXT: [[TMP6:%.*]] = mul <32 x i8> [[TMP2]], [[TMP4]]		; AVX256-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX256-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX256-NEXT: [[TMP6:%.*]] = mul <32 x i8> [[TMP4]], [[TMP5]]
; AVX256-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX256-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @mul_v64i8(		; AVX512-LABEL: @mul_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = mul <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = mul <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-smax.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smax_v8i64(		; SLM-LABEL: @smax_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]])		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]])
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smax_v8i64(		; AVX-LABEL: @smax_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smax_v8i64(		; AVX512-LABEL: @smax_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smax.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smax.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @smax_v16i32() {		define void @smax_v16i32() {
; SSE-LABEL: @smax_v16i32(		; SSE-LABEL: @smax_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smax_v16i32(		; SLM-LABEL: @smax_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smax_v16i32(		; AVX-LABEL: @smax_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smax_v16i32(		; AVX512-LABEL: @smax_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smax.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smax.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @smax_v32i16() {		define void @smax_v32i16() {
; SSE-LABEL: @smax_v32i16(		; SSE-LABEL: @smax_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smax_v32i16(		; SLM-LABEL: @smax_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smax.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smax_v32i16(		; AVX-LABEL: @smax_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.smax.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.smax.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smax.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smax.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smax_v32i16(		; AVX512-LABEL: @smax_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smax.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smax.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @smax_v64i8() {		define void @smax_v64i8() {
; SSE-LABEL: @smax_v64i8(		; SSE-LABEL: @smax_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smax_v64i8(		; SLM-LABEL: @smax_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smax_v64i8(		; AVX-LABEL: @smax_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.smax.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.smax.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smax.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smax.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smax_v64i8(		; AVX512-LABEL: @smax_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smax.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smax.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-smin.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smin_v8i64(		; SLM-LABEL: @smin_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]])		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]])
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smin_v8i64(		; AVX-LABEL: @smin_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smin_v8i64(		; AVX512-LABEL: @smin_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smin.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.smin.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @smin_v16i32() {		define void @smin_v16i32() {
; SSE-LABEL: @smin_v16i32(		; SSE-LABEL: @smin_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smin_v16i32(		; SLM-LABEL: @smin_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smin_v16i32(		; AVX-LABEL: @smin_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smin_v16i32(		; AVX512-LABEL: @smin_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smin.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.smin.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @smin_v32i16() {		define void @smin_v32i16() {
; SSE-LABEL: @smin_v32i16(		; SSE-LABEL: @smin_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smin_v32i16(		; SLM-LABEL: @smin_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.smin.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smin_v32i16(		; AVX-LABEL: @smin_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.smin.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.smin.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smin.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.smin.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smin_v32i16(		; AVX512-LABEL: @smin_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smin.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.smin.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @smin_v64i8() {		define void @smin_v64i8() {
; SSE-LABEL: @smin_v64i8(		; SSE-LABEL: @smin_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @smin_v64i8(		; SLM-LABEL: @smin_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.smin.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @smin_v64i8(		; AVX-LABEL: @smin_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.smin.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.smin.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smin.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.smin.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @smin_v64i8(		; AVX512-LABEL: @smin_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smin.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.smin.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssat.ll

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SLM-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SLM-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SLM-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SLM-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v8i64(		; AVX-LABEL: @sub_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v8i64(		; AVX512-LABEL: @sub_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	; AVX256BW-NEXT: ret void
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @sub_v16i32() {		define void @sub_v16i32() {
; SSE-LABEL: @sub_v16i32(		; SSE-LABEL: @sub_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v16i32(		; SLM-LABEL: @sub_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v16i32(		; AVX-LABEL: @sub_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v16i32(		; AVX512-LABEL: @sub_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @sub_v32i16() {		define void @sub_v32i16() {
; SSE-LABEL: @sub_v32i16(		; SSE-LABEL: @sub_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v32i16(		; SLM-LABEL: @sub_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v32i16(		; AVX-LABEL: @sub_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v32i16(		; AVX512-LABEL: @sub_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @sub_v64i8() {		define void @sub_v64i8() {
; SSE-LABEL: @sub_v64i8(		; SSE-LABEL: @sub_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v64i8(		; SLM-LABEL: @sub_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v64i8(		; AVX-LABEL: @sub_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v64i8(		; AVX512-LABEL: @sub_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.ssub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.ssub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usat.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sub_v8i64(		; AVX-LABEL: @sub_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.usub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.usub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.usub.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.usub.sat.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v8i64(		; AVX512-LABEL: @sub_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.usub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.usub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @sub_v16i32() {		define void @sub_v16i32() {
; SSE-LABEL: @sub_v16i32(		; SSE-LABEL: @sub_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sub_v16i32(		; AVX-LABEL: @sub_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.usub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.usub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.usub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.usub.sat.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v16i32(		; AVX512-LABEL: @sub_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.usub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.usub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @sub_v32i16() {		define void @sub_v32i16() {
; SSE-LABEL: @sub_v32i16(		; SSE-LABEL: @sub_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.usub.sat.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sub_v32i16(		; AVX-LABEL: @sub_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.usub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.usub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.usub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.usub.sat.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v32i16(		; AVX512-LABEL: @sub_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.usub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.usub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @sub_v64i8() {		define void @sub_v64i8() {
; SSE-LABEL: @sub_v64i8(		; SSE-LABEL: @sub_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.usub.sat.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sub_v64i8(		; AVX-LABEL: @sub_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.usub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.usub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.usub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.usub.sat.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v64i8(		; AVX512-LABEL: @sub_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.usub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.usub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-sub.ll

Show All 19 Lines
@c16 = common global [32 x i16] zeroinitializer, align 64		@c16 = common global [32 x i16] zeroinitializer, align 64
@a8 = common global [64 x i8] zeroinitializer, align 64		@a8 = common global [64 x i8] zeroinitializer, align 64
@b8 = common global [64 x i8] zeroinitializer, align 64		@b8 = common global [64 x i8] zeroinitializer, align 64
@c8 = common global [64 x i8] zeroinitializer, align 64		@c8 = common global [64 x i8] zeroinitializer, align 64

define void @sub_v8i64() {		define void @sub_v8i64() {
; SSE-LABEL: @sub_v8i64(		; SSE-LABEL: @sub_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = sub <2 x i64> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = sub <2 x i64> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = sub <2 x i64> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = sub <2 x i64> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = sub <2 x i64> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = sub <2 x i64> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v8i64(		; SLM-LABEL: @sub_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = sub <2 x i64> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = sub <2 x i64> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = sub <2 x i64> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = sub <2 x i64> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = sub <2 x i64> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = sub <2 x i64> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v8i64(		; AVX-LABEL: @sub_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = sub <4 x i64> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = sub <4 x i64> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = sub <4 x i64> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v8i64(		; AVX512-LABEL: @sub_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = sub <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = sub <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @sub_v16i32() {		define void @sub_v16i32() {
; SSE-LABEL: @sub_v16i32(		; SSE-LABEL: @sub_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = sub <4 x i32> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = sub <4 x i32> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = sub <4 x i32> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = sub <4 x i32> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = sub <4 x i32> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = sub <4 x i32> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v16i32(		; SLM-LABEL: @sub_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = sub <4 x i32> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = sub <4 x i32> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = sub <4 x i32> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = sub <4 x i32> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = sub <4 x i32> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = sub <4 x i32> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v16i32(		; AVX-LABEL: @sub_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = sub <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = sub <8 x i32> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = sub <8 x i32> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v16i32(		; AVX512-LABEL: @sub_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = sub <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = sub <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @sub_v32i16() {		define void @sub_v32i16() {
; SSE-LABEL: @sub_v32i16(		; SSE-LABEL: @sub_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = sub <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = sub <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = sub <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = sub <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = sub <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = sub <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v32i16(		; SLM-LABEL: @sub_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = sub <8 x i16> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = sub <8 x i16> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = sub <8 x i16> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = sub <8 x i16> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = sub <8 x i16> [[TMP4]], [[TMP8]]		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = sub <8 x i16> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v32i16(		; AVX-LABEL: @sub_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = sub <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = sub <16 x i16> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = sub <16 x i16> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v32i16(		; AVX512-LABEL: @sub_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = sub <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = sub <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @sub_v64i8() {		define void @sub_v64i8() {
; SSE-LABEL: @sub_v64i8(		; SSE-LABEL: @sub_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = sub <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = sub <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = sub <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = sub <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = sub <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = sub <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = sub <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = sub <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @sub_v64i8(		; SLM-LABEL: @sub_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i8> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = sub <16 x i8> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = sub <16 x i8> [[TMP1]], [[TMP5]]		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = sub <16 x i8> [[TMP2]], [[TMP6]]		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = sub <16 x i8> [[TMP3]], [[TMP7]]		; SLM-NEXT: [[TMP9:%.*]] = sub <16 x i8> [[TMP7]], [[TMP8]]
; SLM-NEXT: [[TMP12:%.*]] = sub <16 x i8> [[TMP4]], [[TMP8]]		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = sub <16 x i8> [[TMP10]], [[TMP11]]
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @sub_v64i8(		; AVX-LABEL: @sub_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = sub <32 x i8> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = sub <32 x i8> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = sub <32 x i8> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = sub <32 x i8> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @sub_v64i8(		; AVX512-LABEL: @sub_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = sub <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = sub <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-umax.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umax_v8i64(		; SLM-LABEL: @umax_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]])		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umax.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]])
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umax_v8i64(		; AVX-LABEL: @umax_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.umax.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.umax.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umax.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umax.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umax_v8i64(		; AVX512-LABEL: @umax_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umax.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umax.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @umax_v16i32() {		define void @umax_v16i32() {
; SSE-LABEL: @umax_v16i32(		; SSE-LABEL: @umax_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umax_v16i32(		; SLM-LABEL: @umax_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umax.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umax_v16i32(		; AVX-LABEL: @umax_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.umax.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.umax.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umax.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umax.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umax_v16i32(		; AVX512-LABEL: @umax_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umax.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umax.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @umax_v32i16() {		define void @umax_v32i16() {
; SSE-LABEL: @umax_v32i16(		; SSE-LABEL: @umax_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umax_v32i16(		; SLM-LABEL: @umax_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umax.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umax_v32i16(		; AVX-LABEL: @umax_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.umax.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.umax.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umax.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umax.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umax_v32i16(		; AVX512-LABEL: @umax_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umax.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umax.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @umax_v64i8() {		define void @umax_v64i8() {
; SSE-LABEL: @umax_v64i8(		; SSE-LABEL: @umax_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umax_v64i8(		; SLM-LABEL: @umax_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umax.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umax_v64i8(		; AVX-LABEL: @umax_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.umax.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.umax.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umax.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umax.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umax_v64i8(		; AVX512-LABEL: @umax_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umax.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umax.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-umin.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; SSE-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; SSE-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; SSE-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; SSE-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umin_v8i64(		; SLM-LABEL: @umin_v8i64(
; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP6:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP1]], <2 x i64> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP10:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP2]], <2 x i64> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: [[TMP11:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP3]], <2 x i64> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP7]], <2 x i64> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP4]], <2 x i64> [[TMP8]])		; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SLM-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SLM-NEXT: [[TMP12:%.*]] = call <2 x i64> @llvm.umin.v2i64(<2 x i64> [[TMP10]], <2 x i64> [[TMP11]])
; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SLM-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umin_v8i64(		; AVX-LABEL: @umin_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[TMP4]], <4 x i64> [[TMP5]])
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umin_v8i64(		; AVX512-LABEL: @umin_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umin.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.umin.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
Show All 32 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @umin_v16i32() {		define void @umin_v16i32() {
; SSE-LABEL: @umin_v16i32(		; SSE-LABEL: @umin_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umin_v16i32(		; SLM-LABEL: @umin_v16i32(
; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP6:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP1]], <4 x i32> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP2]], <4 x i32> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP7]], <4 x i32> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])		; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SLM-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.umin.v4i32(<4 x i32> [[TMP10]], <4 x i32> [[TMP11]])
; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umin_v16i32(		; AVX-LABEL: @umin_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = call <8 x i32> @llvm.umin.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.umin.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umin.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.umin.v8i32(<8 x i32> [[TMP4]], <8 x i32> [[TMP5]])
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umin_v16i32(		; AVX512-LABEL: @umin_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umin.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.umin.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @umin_v32i16() {		define void @umin_v32i16() {
; SSE-LABEL: @umin_v32i16(		; SSE-LABEL: @umin_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umin_v32i16(		; SLM-LABEL: @umin_v32i16(
; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP6:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP1]], <8 x i16> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP10:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP2]], <8 x i16> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])		; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SLM-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.umin.v8i16(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]])
; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umin_v32i16(		; AVX-LABEL: @umin_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = call <16 x i16> @llvm.umin.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.umin.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umin.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.umin.v16i16(<16 x i16> [[TMP4]], <16 x i16> [[TMP5]])
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umin_v32i16(		; AVX512-LABEL: @umin_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umin.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.umin.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @umin_v64i8() {		define void @umin_v64i8() {
; SSE-LABEL: @umin_v64i8(		; SSE-LABEL: @umin_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; SLM-LABEL: @umin_v64i8(		; SLM-LABEL: @umin_v64i8(
; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP2]])
; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP6:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP5]])
; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP1]], <16 x i8> [[TMP5]])		; SLM-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP10:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP2]], <16 x i8> [[TMP6]])		; SLM-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])		; SLM-NEXT: [[TMP9:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP7]], <16 x i8> [[TMP8]])
; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])		; SLM-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SLM-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.umin.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SLM-NEXT: ret void		; SLM-NEXT: ret void
;		;
; AVX-LABEL: @umin_v64i8(		; AVX-LABEL: @umin_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = call <32 x i8> @llvm.umin.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP2]])
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.umin.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umin.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.umin.v32i8(<32 x i8> [[TMP4]], <32 x i8> [[TMP5]])
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @umin_v64i8(		; AVX512-LABEL: @umin_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umin.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.umin.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/bitreverse.ll

Show All 34 Lines	;
store i64 %bitreverse0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 0), align 8		store i64 %bitreverse0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 0), align 8
store i64 %bitreverse1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 1), align 8		store i64 %bitreverse1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @bitreverse_4i64() #0 {		define void @bitreverse_4i64() #0 {
; SSE-LABEL: @bitreverse_4i64(		; SSE-LABEL: @bitreverse_4i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([4 x i64]* @src64 to <2 x i64>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([4 x i64]* @src64 to <2 x i64>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2) to <2 x i64>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.bitreverse.v2i64(<2 x i64> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.bitreverse.v2i64(<2 x i64> [[TMP1]])		; SSE-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* bitcast ([4 x i64]* @dst64 to <2 x i64>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.bitreverse.v2i64(<2 x i64> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2) to <2 x i64>*), align 4
; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([4 x i64]* @dst64 to <2 x i64>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.bitreverse.v2i64(<2 x i64> [[TMP3]])
; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2) to <2 x i64>*), align 4		; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2) to <2 x i64>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bitreverse_4i64(		; AVX-LABEL: @bitreverse_4i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([4 x i64]* @src64 to <4 x i64>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([4 x i64]* @src64 to <4 x i64>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.bitreverse.v4i64(<4 x i64> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.bitreverse.v4i64(<4 x i64> [[TMP1]])
; AVX-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([4 x i64]* @dst64 to <4 x i64>*), align 4		; AVX-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([4 x i64]* @dst64 to <4 x i64>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 39 Lines	;
store i32 %bitreverse2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4		store i32 %bitreverse2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
store i32 %bitreverse3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4		store i32 %bitreverse3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @bitreverse_8i32() #0 {		define void @bitreverse_8i32() #0 {
; SSE-LABEL: @bitreverse_8i32(		; SSE-LABEL: @bitreverse_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> [[TMP1]])		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> [[TMP3]])
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bitreverse_8i32(		; AVX-LABEL: @bitreverse_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.bitreverse.v8i32(<8 x i32> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.bitreverse.v8i32(<8 x i32> [[TMP1]])
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	;
store i16 %bitreverse6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %bitreverse6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %bitreverse7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %bitreverse7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @bitreverse_16i16() #0 {		define void @bitreverse_16i16() #0 {
; SSE-LABEL: @bitreverse_16i16(		; SSE-LABEL: @bitreverse_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.bitreverse.v8i16(<8 x i16> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.bitreverse.v8i16(<8 x i16> [[TMP1]])		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.bitreverse.v8i16(<8 x i16> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.bitreverse.v8i16(<8 x i16> [[TMP3]])
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bitreverse_16i16(		; AVX-LABEL: @bitreverse_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.bitreverse.v16i16(<16 x i16> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.bitreverse.v16i16(<16 x i16> [[TMP1]])
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	;
store i8 %bitreverse14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %bitreverse14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %bitreverse15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %bitreverse15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @bitreverse_32i8() #0 {		define void @bitreverse_32i8() #0 {
; SSE-LABEL: @bitreverse_32i8(		; SSE-LABEL: @bitreverse_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.bitreverse.v16i8(<16 x i8> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.bitreverse.v16i8(<16 x i8> [[TMP1]])		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.bitreverse.v16i8(<16 x i8> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.bitreverse.v16i8(<16 x i8> [[TMP3]])
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bitreverse_32i8(		; AVX-LABEL: @bitreverse_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.bitreverse.v32i8(<32 x i8> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.bitreverse.v32i8(<32 x i8> [[TMP1]])
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/broadcast.ll

	Show All 11 Lines

	define void @bcast_vals(i64 %A, i64 %B, i64 *%S) {			define void @bcast_vals(i64 %A, i64 %B, i64 *%S) {
	; CHECK-LABEL: @bcast_vals(			; CHECK-LABEL: @bcast_vals(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A0:%.]] = load i64, i64 [[A:%.*]], align 8			; CHECK-NEXT: [[A0:%.]] = load i64, i64 [[A:%.*]], align 8
	; CHECK-NEXT: [[B0:%.]] = load i64, i64 [[B:%.*]], align 8			; CHECK-NEXT: [[B0:%.]] = load i64, i64 [[B:%.*]], align 8
	; CHECK-NEXT: [[V1:%.*]] = sub i64 [[A0]], 1			; CHECK-NEXT: [[V1:%.*]] = sub i64 [[A0]], 1
	; CHECK-NEXT: [[V2:%.*]] = sub i64 [[B0]], 1			; CHECK-NEXT: [[V2:%.*]] = sub i64 [[B0]], 1
				; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i64, i64 [[S:%.*]], i64 0
				; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds i64, i64 [[S]], i64 1
				; CHECK-NEXT: [[IDXS2:%.]] = getelementptr inbounds i64, i64 [[S]], i64 2
				; CHECK-NEXT: [[IDXS3:%.]] = getelementptr inbounds i64, i64 [[S]], i64 3
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i64> poison, i64 [[V1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i64> poison, i64 [[V1]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP0]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP0]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[V2]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[V2]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i64, i64 [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds i64, i64 [[S]], i64 1
	; CHECK-NEXT: [[IDXS2:%.]] = getelementptr inbounds i64, i64 [[S]], i64 2
	; CHECK-NEXT: [[IDXS3:%.]] = getelementptr inbounds i64, i64 [[S]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[IDXS0]] to <4 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[IDXS0]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* [[TMP3]], align 8			; CHECK-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%A0 = load i64, i64 *%A, align 8			%A0 = load i64, i64 *%A, align 8
	%B0 = load i64, i64 *%B, align 8			%B0 = load i64, i64 *%B, align 8

	Show All 24 Lines
	;			;
	; We broadcast %v1.			; We broadcast %v1.

	;			;
	define void @bcast_vals2(i16 %A, i16 %B, i16 %C, i16 %D, i16 %E, i32 %S) {			define void @bcast_vals2(i16 %A, i16 %B, i16 %C, i16 %D, i16 %E, i32 %S) {
	; CHECK-LABEL: @bcast_vals2(			; CHECK-LABEL: @bcast_vals2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A0:%.]] = load i16, i16 [[A:%.*]], align 8			; CHECK-NEXT: [[A0:%.]] = load i16, i16 [[A:%.*]], align 8
				; CHECK-NEXT: [[V1:%.*]] = sext i16 [[A0]] to i32
				; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0
				; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds i32, i32 [[S]], i64 1
				; CHECK-NEXT: [[IDXS2:%.]] = getelementptr inbounds i32, i32 [[S]], i64 2
				; CHECK-NEXT: [[IDXS3:%.]] = getelementptr inbounds i32, i32 [[S]], i64 3
	; CHECK-NEXT: [[B0:%.]] = load i16, i16 [[B:%.*]], align 8			; CHECK-NEXT: [[B0:%.]] = load i16, i16 [[B:%.*]], align 8
	; CHECK-NEXT: [[C0:%.]] = load i16, i16 [[C:%.*]], align 8			; CHECK-NEXT: [[C0:%.]] = load i16, i16 [[C:%.*]], align 8
	; CHECK-NEXT: [[D0:%.]] = load i16, i16 [[D:%.*]], align 8			; CHECK-NEXT: [[D0:%.]] = load i16, i16 [[D:%.*]], align 8
	; CHECK-NEXT: [[E0:%.]] = load i16, i16 [[E:%.*]], align 8			; CHECK-NEXT: [[E0:%.]] = load i16, i16 [[E:%.*]], align 8
	; CHECK-NEXT: [[V1:%.*]] = sext i16 [[A0]] to i32
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i16> poison, i16 [[B0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i16> poison, i16 [[B0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i16> [[TMP0]], i16 [[C0]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i16> [[TMP0]], i16 [[C0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i16> [[TMP1]], i16 [[E0]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i16> [[TMP1]], i16 [[E0]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[D0]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[D0]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i16> [[TMP3]] to <4 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = sext <4 x i16> [[TMP3]] to <4 x i32>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> poison, i32 [[V1]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> poison, i32 [[V1]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[SHUFFLE]], [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[SHUFFLE]], [[TMP4]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds i32, i32 [[S]], i64 1
	; CHECK-NEXT: [[IDXS2:%.]] = getelementptr inbounds i32, i32 [[S]], i64 2
	; CHECK-NEXT: [[IDXS3:%.]] = getelementptr inbounds i32, i32 [[S]], i64 3
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[IDXS0]] to <4 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[IDXS0]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], align 8			; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%A0 = load i16, i16 *%A, align 8			%A0 = load i16, i16 *%A, align 8
	%B0 = load i16, i16 *%B, align 8			%B0 = load i16, i16 *%B, align 8
	%C0 = load i16, i16 *%C, align 8			%C0 = load i16, i16 *%C, align 8
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/bswap.ll

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
store i32 %bswap2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4		store i32 %bswap2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
store i32 %bswap3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4		store i32 %bswap3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @bswap_8i32() #0 {		define void @bswap_8i32() #0 {
; SSE-LABEL: @bswap_8i32(		; SSE-LABEL: @bswap_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> [[TMP1]])		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> [[TMP3]])
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bswap_8i32(		; AVX-LABEL: @bswap_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> [[TMP1]])
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i16 %bswap6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %bswap6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %bswap7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %bswap7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @bswap_16i16() #0 {		define void @bswap_16i16() #0 {
; SSE-LABEL: @bswap_16i16(		; SSE-LABEL: @bswap_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> [[TMP1]])		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> [[TMP3]])
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @bswap_16i16(		; AVX-LABEL: @bswap_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> [[TMP1]])
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/combined-stores-chains.ll

	Show All 17 Lines
	; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 4			; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 4
	; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5			; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
	; CHECK-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6			; CHECK-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
	; CHECK-NEXT: [[T32:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7			; CHECK-NEXT: [[T32:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
	; CHECK-NEXT: [[T212:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 8			; CHECK-NEXT: [[T212:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 8
	; CHECK-NEXT: [[T252:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 9			; CHECK-NEXT: [[T252:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 9
	; CHECK-NEXT: [[T292:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 10			; CHECK-NEXT: [[T292:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 10
	; CHECK-NEXT: [[T322:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 11			; CHECK-NEXT: [[T322:%.]] = getelementptr inbounds i64, i64 [[T02]], i64 11
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[T14]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[T142]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[T142]] to <2 x i64>*			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i64> [[TMP2]], <i64 4, i64 4>
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> [[TMP3]], align 8			; CHECK-NEXT: [[TMP4:%.]] = bitcast i64 [[T212]] to <2 x i64>*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[T222]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[T222]] to <2 x i64>*
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 8
	; CHECK-NEXT: [[TMP7:%.*]] = add nsw <4 x i32> [[TMP2]], <i32 4, i32 4, i32 6, i32 7>			; CHECK-NEXT: [[TMP7:%.*]] = add nsw <2 x i64> [[TMP6]], <i64 6, i64 7>
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <2 x i64> [[TMP4]], <i64 4, i64 4>			; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[T292]] to <2 x i64>*
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <2 x i64> [[TMP6]], <i64 6, i64 7>			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[T14]] to <4 x i32>*
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i64 [[T212]] to <2 x i64>*			; CHECK-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4
	; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* [[TMP10]], align 8			; CHECK-NEXT: [[TMP11:%.*]] = add nsw <4 x i32> [[TMP10]], <i32 4, i32 4, i32 6, i32 7>
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i64 [[T292]] to <2 x i64>*			; CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[TMP4]], align 8
	; CHECK-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP7]], <2 x i64>* [[TMP8]], align 8
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[T21]] to <4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[T21]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t0 = bitcast i8* %v0 to i32*			%t0 = bitcast i8* %v0 to i32*
	%t1 = bitcast i8* %v1 to i32*			%t1 = bitcast i8* %v1 to i32*

	%t02 = bitcast i8* %v0 to i64*			%t02 = bitcast i8* %v0 to i64*
	%t12 = bitcast i8* %v1 to i64*			%t12 = bitcast i8* %v1 to i64*

	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll

	Show All 15 Lines
	; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4			; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1			; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1
	; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64			; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2			; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2
	; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64			; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM25]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM25]]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4			; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 2			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 2
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1			; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1
	; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64			; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 35 Lines
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2			; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2
	; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64			; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @C, i32 0, i64 [[IDXPROM25]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @C, i32 0, i64 [[IDXPROM25]]
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM25]]			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM25]]
	; CHECK-NEXT: [[ADD37:%.*]] = add nsw i32 [[MUL]], 3			; CHECK-NEXT: [[ADD37:%.*]] = add nsw i32 [[MUL]], 3
	; CHECK-NEXT: [[IDXPROM38:%.*]] = sext i32 [[ADD37]] to i64			; CHECK-NEXT: [[IDXPROM38:%.*]] = sext i32 [[ADD37]] to i64
	; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @C, i32 0, i64 [[IDXPROM38]]			; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @C, i32 0, i64 [[IDXPROM38]]
				; CHECK-NEXT: [[ARRAYIDX43:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM38]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX43:%.]] = getelementptr inbounds [2000 x float], [2000 x float] @D, i32 0, i64 [[IDXPROM38]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX4]] to <4 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX4]] to <4 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[U]], 6			; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[U]], 6
	; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[MUL]], 6			; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[MUL]], 6
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[ADD6]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[ADD6]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[MUL]], 7			; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[MUL]], 7
	; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD7]] to i64			; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD7]] to i64
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 29 Lines
	; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[U]], 6			; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[U]], 6
	; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[MUL]], 6			; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[MUL]], 6
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[ADD6]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[ADD6]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[MUL]], 7			; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[MUL]], 7
	; CHECK-NEXT: [[IDXPROM12:%.*]] = zext i32 [[ADD7]] to i64			; CHECK-NEXT: [[IDXPROM12:%.*]] = zext i32 [[ADD7]] to i64
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/continue_vectorizing.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; We will keep trying to vectorize the basic block even we already find vectorized store.			; We will keep trying to vectorize the basic block even we already find vectorized store.
	define void @test1(double* %a, double* %b, double* %c, double* %d) {			define void @test1(double* %a, double* %b, double* %c, double* %d) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[C]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[C]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 8			; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 8
	; CHECK-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[D:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[D:%.]] to <4 x i32>
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define void @exceed(double %0, double %1) {			define void @exceed(double %0, double %1) {
	; CHECK-LABEL: @exceed(			; CHECK-LABEL: @exceed(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
	; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP7]], undef
	; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP7]], undef
	; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
				; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
				; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP0]], i32 1
				; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP1]], i32 1
				; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <2 x double> [[TMP3]], [[TMP5]]
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
				; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP7]], undef
				; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP7]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_mandeltext.ll

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	for.end48: ; preds = %for.end44
ret void		ret void
}		}

%struct.hoge = type { double, double, double}		%struct.hoge = type { double, double, double}

define void @zot(%struct.hoge* %arg) {		define void @zot(%struct.hoge* %arg) {
; CHECK-LABEL: @zot(		; CHECK-LABEL: @zot(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_HOGE:%.]], %struct.hoge* [[ARG:%.*]], i64 0, i32 1
; CHECK-NEXT: [[TMP:%.]] = load double, double undef, align 8		; CHECK-NEXT: [[TMP:%.]] = load double, double undef, align 8
; CHECK-NEXT: [[TMP2:%.]] = load double, double undef, align 8		; CHECK-NEXT: [[TMP2:%.]] = load double, double undef, align 8
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0		; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP2]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP]], i32 1
; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], undef		; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], undef
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_HOGE:%.]], %struct.hoge* [[ARG:%.*]], i64 0, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], undef		; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], undef
; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef		; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP3]], undef
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP7]] to <2 x double>*		; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TMP7]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8		; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
; CHECK-NEXT: br i1 undef, label [[BB11:%.]], label [[BB12:%.]]		; CHECK-NEXT: br i1 undef, label [[BB11:%.]], label [[BB12:%.]]
; CHECK: bb11:		; CHECK: bb11:
; CHECK-NEXT: br label [[BB14:%.*]]		; CHECK-NEXT: br label [[BB14:%.*]]
; CHECK: bb12:		; CHECK: bb12:
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show All 30 Lines
	; CHECK: cond.false66.us:			; CHECK: cond.false66.us:
	; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef			; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double 0xBFA5CC2D1960285F>, double [[ADD_I276_US]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double 0xBFA5CC2D1960285F>, double [[ADD_I276_US]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP4]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP5]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP5]], i32 1
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP9]], align 8			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true63.us:			; CHECK: cond.true63.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.body42.lr.ph.us:			; CHECK: for.body42.lr.ph.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]
	; CHECK: _Z5clampd.exit.1:			; CHECK: _Z5clampd.exit.1:
	; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]			; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]
	;			;
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }			%struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601 = type { %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600, %struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 }
	%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }			%struct.Vec.0.6.48.90.132.186.192.198.234.252.258.264.270.276.282.288.378.432.438.450.456.594.600 = type { double, double, double }

	define void @_Z8radianceRK3RayiPt() #0 {			define void @_Z8radianceRK3RayiPt() #0 {
	; CHECK-LABEL: @_Z8radianceRK3RayiPt(			; CHECK-LABEL: @_Z8radianceRK3RayiPt(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN78:%.]], label [[IF_THEN38:%.]]
	; CHECK: if.then38:			; CHECK: if.then38:
				; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double undef, double poison>, double undef, i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> undef, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> undef, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> undef, [[TMP6]]
	; CHECK-NEXT: [[AGG_TMP74663_SROA_0_0_IDX:%.]] = getelementptr inbounds [[STRUCT_RAY_5_11_53_95_137_191_197_203_239_257_263_269_275_281_287_293_383_437_443_455_461_599_601:%.]], %struct.Ray.5.11.53.95.137.191.197.203.239.257.263.269.275.281.287.293.383.437.443.455.461.599.601* undef, i64 0, i32 1, i32 0
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[AGG_TMP74663_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: br label [[RETURN:%.*]]			; CHECK-NEXT: br label [[RETURN:%.*]]
	; CHECK: if.then78:			; CHECK: if.then78:
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 10 Lines
	; G[3] = 8+G[6]*4;			; G[3] = 8+G[6]*4;
	;}			;}

	define i32 @test(double* nocapture %G) {			define i32 @test(double* nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00
				; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[TMP3:%.]] = load double, double [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fmul double [[TMP3]], 4.000000e+00			; CHECK-NEXT: [[TMP4:%.*]] = fmul double [[TMP3]], 4.000000e+00
	; CHECK-NEXT: br i1 [[TMP1]], label [[TMP14:%.]], label [[TMP5:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[TMP14:%.]], label [[TMP5:%.]]
	; CHECK: 5:			; CHECK: 5:
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP7:%.]] = load double, double [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load double, double [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fmul double [[TMP7]], 3.000000e+00			; CHECK-NEXT: [[TMP8:%.*]] = fmul double [[TMP7]], 3.000000e+00
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP9]], double [[TMP8]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP10]], double [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double [[G]], i64 1			; CHECK-NEXT: [[TMP12:%.*]] = fadd <2 x double> [[TMP11]], <double 1.000000e+00, double 6.000000e+00>
	; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[G]] to <2 x double>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP13]], align 8			; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP13]], align 8
	; CHECK-NEXT: br label [[TMP24:%.*]]			; CHECK-NEXT: br label [[TMP24:%.*]]
	; CHECK: 14:			; CHECK: 14:
	; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP17:%.]] = load double, double [[TMP16]], align 8			; CHECK-NEXT: [[TMP17:%.]] = load double, double [[TMP16]], align 8
	; CHECK-NEXT: [[TMP18:%.*]] = fmul double [[TMP17]], 3.000000e+00			; CHECK-NEXT: [[TMP18:%.*]] = fmul double [[TMP17]], 3.000000e+00
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0			; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> [[TMP19]], double [[TMP18]], i32 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP20]], <double 7.000000e+00, double 8.000000e+00>			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x double> [[TMP20]], double [[TMP18]], i32 1
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[TMP22:%.*]] = fadd <2 x double> [[TMP21]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*			; CHECK-NEXT: [[TMP23:%.]] = bitcast double [[TMP15]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP21]], <2 x double>* [[TMP23]], align 8			; CHECK-NEXT: store <2 x double> [[TMP22]], <2 x double>* [[TMP23]], align 8
	; CHECK-NEXT: br label [[TMP24]]			; CHECK-NEXT: br label [[TMP24]]
	; CHECK: 24:			; CHECK: 24:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%1 = icmp eq i32 %k, 0			%1 = icmp eq i32 %k, 0
	%2 = getelementptr inbounds double, double* %G, i64 5			%2 = getelementptr inbounds double, double* %G, i64 5
	%3 = load double, double* %2, align 8			%3 = load double, double* %2, align 8
	%4 = fmul double %3, 4.000000e+00			%4 = fmul double %3, 4.000000e+00
	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[A]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[A]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], 4			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], 4
	; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[IF_END:%.]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds double, double [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds double, double [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds double, double [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[N]], 4			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[N]], 4
	; CHECK-NEXT: [[CONV12:%.*]] = sitofp i32 [[ADD]] to double			; CHECK-NEXT: [[CONV12:%.*]] = sitofp i32 [[ADD]] to double
				; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*
				; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP2]], double [[CONV12]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP2]], double [[CONV12]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP8]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP8]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: br label [[RETURN]]			; CHECK-NEXT: br label [[RETURN]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/ctlz.ll

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
; SSE2-NEXT: store i32 [[CTLZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2		; SSE2-NEXT: store i32 [[CTLZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTLZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2		; SSE2-NEXT: store i32 [[CTLZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTLZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2		; SSE2-NEXT: store i32 [[CTLZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTLZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2		; SSE2-NEXT: store i32 [[CTLZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @ctlz_8i32(		; SSE42-LABEL: @ctlz_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP1]], i1 false)		; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP2]], i1 false)		; SSE42-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP3]], i1 false)
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void		; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_8i32(		; AVX-LABEL: @ctlz_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> [[TMP1]], i1 false)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i16 %ctlz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %ctlz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %ctlz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %ctlz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @ctlz_16i16() #0 {		define void @ctlz_16i16() #0 {
; SSE-LABEL: @ctlz_16i16(		; SSE-LABEL: @ctlz_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP1]], i1 false)		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP2]], i1 false)		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP3]], i1 false)
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_16i16(		; AVX-LABEL: @ctlz_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctlz.v16i16(<16 x i16> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctlz.v16i16(<16 x i16> [[TMP1]], i1 false)
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	;
store i8 %ctlz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %ctlz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %ctlz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %ctlz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @ctlz_32i8() #0 {		define void @ctlz_32i8() #0 {
; SSE-LABEL: @ctlz_32i8(		; SSE-LABEL: @ctlz_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP1]], i1 false)		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP2]], i1 false)		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP3]], i1 false)
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_32i8(		; AVX-LABEL: @ctlz_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctlz.v32i8(<32 x i8> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctlz.v32i8(<32 x i8> [[TMP1]], i1 false)
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
; SSE2-NEXT: store i32 [[CTLZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2		; SSE2-NEXT: store i32 [[CTLZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTLZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2		; SSE2-NEXT: store i32 [[CTLZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTLZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2		; SSE2-NEXT: store i32 [[CTLZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTLZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2		; SSE2-NEXT: store i32 [[CTLZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @ctlz_undef_8i32(		; SSE42-LABEL: @ctlz_undef_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP1]], i1 true)		; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP2]], i1 true)		; SSE42-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[TMP3]], i1 true)
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void		; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_undef_8i32(		; AVX-LABEL: @ctlz_undef_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> [[TMP1]], i1 true)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i16 %ctlz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %ctlz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %ctlz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %ctlz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @ctlz_undef_16i16() #0 {		define void @ctlz_undef_16i16() #0 {
; SSE-LABEL: @ctlz_undef_16i16(		; SSE-LABEL: @ctlz_undef_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP1]], i1 true)
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP1]], i1 true)		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP2]], i1 true)		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctlz.v8i16(<8 x i16> [[TMP3]], i1 true)
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_undef_16i16(		; AVX-LABEL: @ctlz_undef_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctlz.v16i16(<16 x i16> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctlz.v16i16(<16 x i16> [[TMP1]], i1 true)
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	;
store i8 %ctlz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %ctlz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %ctlz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %ctlz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @ctlz_undef_32i8() #0 {		define void @ctlz_undef_32i8() #0 {
; SSE-LABEL: @ctlz_undef_32i8(		; SSE-LABEL: @ctlz_undef_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP1]], i1 true)
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP1]], i1 true)		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP2]], i1 true)		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> [[TMP3]], i1 true)
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctlz_undef_32i8(		; AVX-LABEL: @ctlz_undef_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctlz.v32i8(<32 x i8> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctlz.v32i8(<32 x i8> [[TMP1]], i1 true)
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/ctpop.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i64 %ctpop0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 0), align 8		store i64 %ctpop0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 0), align 8
store i64 %ctpop1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 1), align 8		store i64 %ctpop1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @ctpop_4i64() #0 {		define void @ctpop_4i64() #0 {
; SSE2-LABEL: @ctpop_4i64(		; SSE2-LABEL: @ctpop_4i64(
; SSE2-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([4 x i64]* @src64 to <2 x i64>*), align 4		; SSE2-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([4 x i64]* @src64 to <2 x i64>*), align 4
; SSE2-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2) to <2 x i64>*), align 4		; SSE2-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> [[TMP1]])
; SSE2-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> [[TMP1]])		; SSE2-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* bitcast ([4 x i64]* @dst64 to <2 x i64>*), align 4
; SSE2-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> [[TMP2]])		; SSE2-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2) to <2 x i64>*), align 4
; SSE2-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([4 x i64]* @dst64 to <2 x i64>*), align 4		; SSE2-NEXT: [[TMP4:%.*]] = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> [[TMP3]])
; SSE2-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2) to <2 x i64>*), align 4		; SSE2-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* bitcast (i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2) to <2 x i64>*), align 4
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @ctpop_4i64(		; SSE42-LABEL: @ctpop_4i64(
; SSE42-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 0), align 4		; SSE42-NEXT: [[LD0:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 0), align 4
; SSE42-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 1), align 4		; SSE42-NEXT: [[LD1:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 1), align 4
; SSE42-NEXT: [[LD2:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2), align 4		; SSE42-NEXT: [[LD2:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 2), align 4
; SSE42-NEXT: [[LD3:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 3), align 4		; SSE42-NEXT: [[LD3:%.]] = load i64, i64 getelementptr inbounds ([4 x i64], [4 x i64]* @src64, i64 0, i64 3), align 4
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	;
store i32 %ctpop2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4		store i32 %ctpop2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
store i32 %ctpop3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4		store i32 %ctpop3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @ctpop_8i32() #0 {		define void @ctpop_8i32() #0 {
; SSE2-LABEL: @ctpop_8i32(		; SSE2-LABEL: @ctpop_8i32(
; SSE2-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE2-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE2-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE2-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> [[TMP1]])
; SSE2-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> [[TMP1]])		; SSE2-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE2-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> [[TMP2]])		; SSE2-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE2-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> [[TMP3]])
; SSE2-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE2-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @ctpop_8i32(		; SSE42-LABEL: @ctpop_8i32(
; SSE42-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2		; SSE42-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2
; SSE42-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 2		; SSE42-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 2
; SSE42-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 2		; SSE42-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 2
; SSE42-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 2		; SSE42-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 2
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	;
store i16 %ctpop6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %ctpop6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %ctpop7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %ctpop7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @ctpop_16i16() #0 {		define void @ctpop_16i16() #0 {
; SSE-LABEL: @ctpop_16i16(		; SSE-LABEL: @ctpop_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> [[TMP1]])		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> [[TMP3]])
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctpop_16i16(		; AVX-LABEL: @ctpop_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctpop.v16i16(<16 x i16> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.ctpop.v16i16(<16 x i16> [[TMP1]])
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	;
store i8 %ctpop14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %ctpop14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %ctpop15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %ctpop15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @ctpop_32i8() #0 {		define void @ctpop_32i8() #0 {
; SSE-LABEL: @ctpop_32i8(		; SSE-LABEL: @ctpop_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.ctpop.v16i8(<16 x i8> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.ctpop.v16i8(<16 x i8> [[TMP1]])		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctpop.v16i8(<16 x i8> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.ctpop.v16i8(<16 x i8> [[TMP3]])
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ctpop_32i8(		; AVX-LABEL: @ctpop_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctpop.v32i8(<32 x i8> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.ctpop.v32i8(<32 x i8> [[TMP1]])
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cttz.ll

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2		; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2		; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2		; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2		; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @cttz_8i32(		; SSE42-LABEL: @cttz_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)		; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 false)		; SSE42-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP3]], i1 false)
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void		; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @cttz_8i32(		; AVX-LABEL: @cttz_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 false)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i16 %cttz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %cttz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %cttz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %cttz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @cttz_16i16() #0 {		define void @cttz_16i16() #0 {
; SSE-LABEL: @cttz_16i16(		; SSE-LABEL: @cttz_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP1]], i1 false)		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP2]], i1 false)		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP3]], i1 false)
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @cttz_16i16(		; AVX-LABEL: @cttz_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> [[TMP1]], i1 false)
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	;
store i8 %cttz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %cttz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %cttz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %cttz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @cttz_32i8() #0 {		define void @cttz_32i8() #0 {
; SSE-LABEL: @cttz_32i8(		; SSE-LABEL: @cttz_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP1]], i1 false)
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP1]], i1 false)		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP2]], i1 false)		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP3]], i1 false)
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @cttz_32i8(		; AVX-LABEL: @cttz_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> [[TMP1]], i1 false)
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2		; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2		; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2		; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2		; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void		; SSE2-NEXT: ret void
;		;
; SSE42-LABEL: @cttz_undef_8i32(		; SSE42-LABEL: @cttz_undef_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)		; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 true)		; SSE42-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2		; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP3]], i1 true)
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2		; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void		; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @cttz_undef_8i32(		; AVX-LABEL: @cttz_undef_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 true)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i16 %cttz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2		store i16 %cttz6, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 6), align 2
store i16 %cttz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2		store i16 %cttz7, i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 7), align 2
ret void		ret void
}		}

define void @cttz_undef_16i16() #0 {		define void @cttz_undef_16i16() #0 {
; SSE-LABEL: @cttz_undef_16i16(		; SSE-LABEL: @cttz_undef_16i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([16 x i16]* @src16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP1]], i1 true)
; SSE-NEXT: [[TMP3:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP1]], i1 true)		; SSE-NEXT: store <8 x i16> [[TMP2]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP2]], i1 true)		; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @src16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([16 x i16]* @dst16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> [[TMP3]], i1 true)
; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP4]], <8 x i16>* bitcast (i16* getelementptr inbounds ([16 x i16], [16 x i16]* @dst16, i16 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @cttz_undef_16i16(		; AVX-LABEL: @cttz_undef_16i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([16 x i16]* @src16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> [[TMP1]], i1 true)
; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP2]], <16 x i16>* bitcast ([16 x i16]* @dst16 to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	;
store i8 %cttz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1		store i8 %cttz14, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 14), align 1
store i8 %cttz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1		store i8 %cttz15, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 15), align 1
ret void		ret void
}		}

define void @cttz_undef_32i8() #0 {		define void @cttz_undef_32i8() #0 {
; SSE-LABEL: @cttz_undef_32i8(		; SSE-LABEL: @cttz_undef_32i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([32 x i8]* @src8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP1]], i1 true)
; SSE-NEXT: [[TMP3:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP1]], i1 true)		; SSE-NEXT: store <16 x i8> [[TMP2]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP2]], i1 true)		; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @src8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([32 x i8]* @dst8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.*]] = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> [[TMP3]], i1 true)
; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast (i8* getelementptr inbounds ([32 x i8], [32 x i8]* @dst8, i8 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @cttz_undef_32i8(		; AVX-LABEL: @cttz_undef_32i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([32 x i8]* @src8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> [[TMP1]], i1 true)
; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP2]], <32 x i8>* bitcast ([32 x i8]* @dst8 to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/diamond.ll

	Show All 15 Lines
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]			; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* %A, align 4			%0 = load i32, i32* %A, align 4
	%mul238 = add i32 %m, %n			%mul238 = add i32 %m, %n
	%add = mul i32 %0, %mul238			%add = mul i32 %0, %mul238
	Show All 29 Lines
	; CHECK-LABEL: @extr_user(			; CHECK-LABEL: @extr_user(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]			; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0
	; CHECK-NEXT: ret i32 [[TMP5]]			; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %A, align 4			%0 = load i32, i32* %A, align 4
	%mul238 = add i32 %m, %n			%mul238 = add i32 %m, %n
	Show All 22 Lines
	; CHECK-LABEL: @extr_user1(			; CHECK-LABEL: @extr_user1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]			; CHECK-NEXT: [[MUL238:%.]] = add i32 [[M:%.]], [[N:%.*]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> poison, i32 [[MUL238]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
	; CHECK-NEXT: ret i32 [[TMP5]]			; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %A, align 4			%0 = load i32, i32* %A, align 4
	%mul238 = add i32 %m, %n			%mul238 = add i32 %m, %n
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-1 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-1 \| FileCheck %s

	define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast(			; CHECK-LABEL: @diamond_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -slp-threshold=-2 \| FileCheck %s

	define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast(			; CHECK-LABEL: @diamond_broadcast(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	Show All 10 Lines
	}			}

	define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast2(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast2(			; CHECK-LABEL: @diamond_broadcast2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	Show All 10 Lines
	}			}

	define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {			define i32 @diamond_broadcast3(i32* noalias nocapture %B, i32* noalias nocapture %A) {
	; CHECK-LABEL: @diamond_broadcast3(			; CHECK-LABEL: @diamond_broadcast3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4			; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[A:%.*]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[LD]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> [[SHUFFLE]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%ld = load i32, i32* %A, align 4			%ld = load i32, i32* %A, align 4
	%mul = mul i32 %ld, %ld			%mul = mul i32 %ld, %ld
	store i32 %mul, i32* %B, align 4			store i32 %mul, i32* %B, align 4
	Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/X86/different-vec-widths.ll

	Show All 20 Lines
	; SSE-NEXT: [[Q0:%.]] = getelementptr inbounds double, double [[Q:%.*]], i64 0			; SSE-NEXT: [[Q0:%.]] = getelementptr inbounds double, double [[Q:%.*]], i64 0
	; SSE-NEXT: [[Q1:%.]] = getelementptr inbounds double, double [[Q]], i64 1			; SSE-NEXT: [[Q1:%.]] = getelementptr inbounds double, double [[Q]], i64 1
	; SSE-NEXT: [[Q2:%.]] = getelementptr inbounds double, double [[Q]], i64 2			; SSE-NEXT: [[Q2:%.]] = getelementptr inbounds double, double [[Q]], i64 2
	; SSE-NEXT: [[Q3:%.]] = getelementptr inbounds double, double [[Q]], i64 3			; SSE-NEXT: [[Q3:%.]] = getelementptr inbounds double, double [[Q]], i64 3
	; SSE-NEXT: [[Q4:%.]] = getelementptr inbounds double, double [[Q]], i64 4			; SSE-NEXT: [[Q4:%.]] = getelementptr inbounds double, double [[Q]], i64 4
	; SSE-NEXT: [[Q5:%.]] = getelementptr inbounds double, double [[Q]], i64 5			; SSE-NEXT: [[Q5:%.]] = getelementptr inbounds double, double [[Q]], i64 5
	; SSE-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <2 x double>*			; SSE-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <2 x double>*
	; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8			; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
	; SSE-NEXT: [[TMP3:%.]] = bitcast double [[P2]] to <2 x double>*			; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 1.000000e+00>
	; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8			; SSE-NEXT: [[TMP4:%.]] = bitcast double [[Q0]] to <2 x double>*
	; SSE-NEXT: [[TMP5:%.]] = bitcast double [[P4]] to <2 x double>*			; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
				; SSE-NEXT: [[TMP5:%.]] = bitcast double [[P2]] to <2 x double>*
	; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8			; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8
	; SSE-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 1.000000e+00>			; SSE-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 1.000000e+00, double 1.000000e+00>
	; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP4]], <double 1.000000e+00, double 1.000000e+00>			; SSE-NEXT: [[TMP8:%.]] = bitcast double [[Q2]] to <2 x double>*
	; SSE-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP6]], <double 1.000000e+00, double 1.000000e+00>			; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; SSE-NEXT: [[TMP10:%.]] = bitcast double [[Q0]] to <2 x double>*			; SSE-NEXT: [[TMP9:%.]] = bitcast double [[P4]] to <2 x double>*
	; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP10]], align 8			; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> [[TMP9]], align 8
	; SSE-NEXT: [[TMP11:%.]] = bitcast double [[Q2]] to <2 x double>*			; SSE-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 1.000000e+00>
	; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP11]], align 8
	; SSE-NEXT: [[TMP12:%.]] = bitcast double [[Q4]] to <2 x double>*			; SSE-NEXT: [[TMP12:%.]] = bitcast double [[Q4]] to <2 x double>*
	; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP12]], align 8			; SSE-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @PR28457(			; AVX-LABEL: @PR28457(
	; AVX-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 0			; AVX-NEXT: [[P0:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 0
	; AVX-NEXT: [[P1:%.]] = getelementptr inbounds double, double [[P]], i64 1			; AVX-NEXT: [[P1:%.]] = getelementptr inbounds double, double [[P]], i64 1
	; AVX-NEXT: [[P2:%.]] = getelementptr inbounds double, double [[P]], i64 2			; AVX-NEXT: [[P2:%.]] = getelementptr inbounds double, double [[P]], i64 2
	; AVX-NEXT: [[P3:%.]] = getelementptr inbounds double, double [[P]], i64 3			; AVX-NEXT: [[P3:%.]] = getelementptr inbounds double, double [[P]], i64 3
	; AVX-NEXT: [[P4:%.]] = getelementptr inbounds double, double [[P]], i64 4			; AVX-NEXT: [[P4:%.]] = getelementptr inbounds double, double [[P]], i64 4
	; AVX-NEXT: [[P5:%.]] = getelementptr inbounds double, double [[P]], i64 5			; AVX-NEXT: [[P5:%.]] = getelementptr inbounds double, double [[P]], i64 5
	; AVX-NEXT: [[Q0:%.]] = getelementptr inbounds double, double [[Q:%.*]], i64 0			; AVX-NEXT: [[Q0:%.]] = getelementptr inbounds double, double [[Q:%.*]], i64 0
	; AVX-NEXT: [[Q1:%.]] = getelementptr inbounds double, double [[Q]], i64 1			; AVX-NEXT: [[Q1:%.]] = getelementptr inbounds double, double [[Q]], i64 1
	; AVX-NEXT: [[Q2:%.]] = getelementptr inbounds double, double [[Q]], i64 2			; AVX-NEXT: [[Q2:%.]] = getelementptr inbounds double, double [[Q]], i64 2
	; AVX-NEXT: [[Q3:%.]] = getelementptr inbounds double, double [[Q]], i64 3			; AVX-NEXT: [[Q3:%.]] = getelementptr inbounds double, double [[Q]], i64 3
	; AVX-NEXT: [[Q4:%.]] = getelementptr inbounds double, double [[Q]], i64 4			; AVX-NEXT: [[Q4:%.]] = getelementptr inbounds double, double [[Q]], i64 4
	; AVX-NEXT: [[Q5:%.]] = getelementptr inbounds double, double [[Q]], i64 5			; AVX-NEXT: [[Q5:%.]] = getelementptr inbounds double, double [[Q]], i64 5
	; AVX-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <4 x double>*			; AVX-NEXT: [[TMP1:%.]] = bitcast double [[P0]] to <4 x double>*
	; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> [[TMP1]], align 8			; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> [[TMP1]], align 8
	; AVX-NEXT: [[TMP3:%.]] = bitcast double [[P4]] to <2 x double>*			; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP2]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; AVX-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8			; AVX-NEXT: [[TMP4:%.]] = bitcast double [[Q0]] to <4 x double>*
	; AVX-NEXT: [[TMP5:%.*]] = fadd <4 x double> [[TMP2]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* [[TMP4]], align 8
	; AVX-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], <double 1.000000e+00, double 1.000000e+00>			; AVX-NEXT: [[TMP5:%.]] = bitcast double [[P4]] to <2 x double>*
	; AVX-NEXT: [[TMP7:%.]] = bitcast double [[Q0]] to <4 x double>*			; AVX-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8
	; AVX-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP7]], align 8			; AVX-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 1.000000e+00, double 1.000000e+00>
	; AVX-NEXT: [[TMP8:%.]] = bitcast double [[Q4]] to <2 x double>*			; AVX-NEXT: [[TMP8:%.]] = bitcast double [[Q4]] to <2 x double>*
	; AVX-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP8]], align 8			; AVX-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%p0 = getelementptr inbounds double, double* %p, i64 0			%p0 = getelementptr inbounds double, double* %p, i64 0
	%p1 = getelementptr inbounds double, double* %p, i64 1			%p1 = getelementptr inbounds double, double* %p, i64 1
	%p2 = getelementptr inbounds double, double* %p, i64 2			%p2 = getelementptr inbounds double, double* %p, i64 2
	%p3 = getelementptr inbounds double, double* %p, i64 3			%p3 = getelementptr inbounds double, double* %p, i64 3
	%p4 = getelementptr inbounds double, double* %p, i64 4			%p4 = getelementptr inbounds double, double* %p, i64 4
	%p5 = getelementptr inbounds double, double* %p, i64 5			%p5 = getelementptr inbounds double, double* %p, i64 5
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll

	Show All 14 Lines
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2
	; CHECK-NEXT: [[PTRX3:%.]] = getelementptr inbounds double, double [[PTRX]], i64 3			; CHECK-NEXT: [[PTRX3:%.]] = getelementptr inbounds double, double [[PTRX]], i64 3
	; CHECK-NEXT: [[PTRY3:%.]] = getelementptr inbounds double, double [[PTRY]], i64 3			; CHECK-NEXT: [[PTRY3:%.]] = getelementptr inbounds double, double [[PTRY]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY]] to <2 x double>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[PTRX2]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[PTRX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[PTRY2]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> [[TMP7]], align 4			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[PTRY2]] to <2 x double>*
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP9:%.]] = load <2 x double>, <2 x double> [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT01:%.*]] = fadd double [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd double [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x double> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x double> [[TMP10]], i32 0
	; CHECK-NEXT: [[DOT012:%.*]] = fadd double [[DOT01]], [[TMP13]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd double [[DOT01]], [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x double> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x double> [[TMP10]], i32 1
	; CHECK-NEXT: [[DOT0123:%.*]] = fadd double [[DOT012]], [[TMP14]]			; CHECK-NEXT: [[DOT0123:%.*]] = fadd double [[DOT012]], [[TMP14]]
	; CHECK-NEXT: ret double [[DOT0123]]			; CHECK-NEXT: ret double [[DOT0123]]
	;			;
	%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1			%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1
	Show All 27 Lines
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
	; CHECK-NEXT: [[PTRX3:%.]] = getelementptr inbounds float, float [[PTRX]], i64 3			; CHECK-NEXT: [[PTRX3:%.]] = getelementptr inbounds float, float [[PTRX]], i64 3
	; CHECK-NEXT: [[PTRY3:%.]] = getelementptr inbounds float, float [[PTRY]], i64 3			; CHECK-NEXT: [[PTRY3:%.]] = getelementptr inbounds float, float [[PTRY]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[PTRX2]] to <2 x float>*			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> [[TMP5]], align 4			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[PTRX2]] to <2 x float>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[PTRY2]] to <2 x float>*			; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = load <2 x float>, <2 x float> [[TMP7]], align 4			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[PTRY2]] to <2 x float>*
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP9:%.]] = load <2 x float>, <2 x float> [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT01:%.*]] = fadd float [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd float [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP10]], i32 0
	; CHECK-NEXT: [[DOT012:%.*]] = fadd float [[DOT01]], [[TMP13]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd float [[DOT01]], [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP10]], i32 1
	; CHECK-NEXT: [[DOT0123:%.*]] = fadd float [[DOT012]], [[TMP14]]			; CHECK-NEXT: [[DOT0123:%.*]] = fadd float [[DOT012]], [[TMP14]]
	; CHECK-NEXT: ret float [[DOT0123]]			; CHECK-NEXT: ret float [[DOT0123]]
	;			;
	%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1			%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	define double @dot3f64(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {			define double @dot3f64(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {
	; CHECK-LABEL: @dot3f64(			; CHECK-LABEL: @dot3f64(
	; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1			; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1
	; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds double, double [[PTRY:%.*]], i64 1			; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds double, double [[PTRY:%.*]], i64 1
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2
	; CHECK-NEXT: [[X0:%.]] = load double, double [[PTRX]], align 4			; CHECK-NEXT: [[X0:%.]] = load double, double [[PTRX]], align 4
	; CHECK-NEXT: [[Y0:%.]] = load double, double [[PTRY]], align 4			; CHECK-NEXT: [[Y0:%.]] = load double, double [[PTRY]], align 4
				; CHECK-NEXT: [[MUL0:%.*]] = fmul double [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX1]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX1]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY1]] to <2 x double>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY1]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
	; CHECK-NEXT: [[MUL0:%.*]] = fmul double [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	; CHECK-NEXT: [[DOT01:%.*]] = fadd double [[MUL0]], [[TMP6]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd double [[MUL0]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT012:%.*]] = fadd double [[DOT01]], [[TMP7]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd double [[DOT01]], [[TMP7]]
	; CHECK-NEXT: ret double [[DOT012]]			; CHECK-NEXT: ret double [[DOT012]]
	;			;
	%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1			%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1
	Show All 17 Lines
	define float @dot3f32(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {			define float @dot3f32(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {
	; CHECK-LABEL: @dot3f32(			; CHECK-LABEL: @dot3f32(
	; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1			; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
	; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1			; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
	; CHECK-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4			; CHECK-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4
	; CHECK-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4			; CHECK-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4
				; CHECK-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; CHECK-NEXT: [[DOT01:%.*]] = fadd float [[MUL0]], [[TMP6]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd float [[MUL0]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT012:%.*]] = fadd float [[DOT01]], [[TMP7]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd float [[DOT01]], [[TMP7]]
	; CHECK-NEXT: ret float [[DOT012]]			; CHECK-NEXT: ret float [[DOT012]]
	;			;
	%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1			%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1
	Show All 17 Lines
	define double @dot3f64_fast(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {			define double @dot3f64_fast(double* dereferenceable(32) %ptrx, double* dereferenceable(32) %ptry) {
	; CHECK-LABEL: @dot3f64_fast(			; CHECK-LABEL: @dot3f64_fast(
	; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1			; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds double, double [[PTRX:%.*]], i64 1
	; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds double, double [[PTRY:%.*]], i64 1			; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds double, double [[PTRY:%.*]], i64 1
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds double, double [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds double, double [[PTRY]], i64 2
	; CHECK-NEXT: [[X0:%.]] = load double, double [[PTRX]], align 4			; CHECK-NEXT: [[X0:%.]] = load double, double [[PTRX]], align 4
	; CHECK-NEXT: [[Y0:%.]] = load double, double [[PTRY]], align 4			; CHECK-NEXT: [[Y0:%.]] = load double, double [[PTRY]], align 4
				; CHECK-NEXT: [[MUL0:%.*]] = fmul double [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX1]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTRX1]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY1]] to <2 x double>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRY1]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
	; CHECK-NEXT: [[MUL0:%.*]] = fmul double [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	; CHECK-NEXT: [[DOT01:%.*]] = fadd fast double [[MUL0]], [[TMP6]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd fast double [[MUL0]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT012:%.*]] = fadd fast double [[DOT01]], [[TMP7]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd fast double [[DOT01]], [[TMP7]]
	; CHECK-NEXT: ret double [[DOT012]]			; CHECK-NEXT: ret double [[DOT012]]
	;			;
	%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1			%ptrx1 = getelementptr inbounds double, double* %ptrx, i64 1
	Show All 17 Lines
	define float @dot3f32_fast(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {			define float @dot3f32_fast(float* dereferenceable(16) %ptrx, float* dereferenceable(16) %ptry) {
	; CHECK-LABEL: @dot3f32_fast(			; CHECK-LABEL: @dot3f32_fast(
	; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1			; CHECK-NEXT: [[PTRX1:%.]] = getelementptr inbounds float, float [[PTRX:%.*]], i64 1
	; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1			; CHECK-NEXT: [[PTRY1:%.]] = getelementptr inbounds float, float [[PTRY:%.*]], i64 1
	; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2			; CHECK-NEXT: [[PTRX2:%.]] = getelementptr inbounds float, float [[PTRX]], i64 2
	; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2			; CHECK-NEXT: [[PTRY2:%.]] = getelementptr inbounds float, float [[PTRY]], i64 2
	; CHECK-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4			; CHECK-NEXT: [[X0:%.]] = load float, float [[PTRX]], align 4
	; CHECK-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4			; CHECK-NEXT: [[Y0:%.]] = load float, float [[PTRY]], align 4
				; CHECK-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTRX1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[PTRY1]] to <2 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[MUL0:%.*]] = fmul float [[X0]], [[Y0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; CHECK-NEXT: [[DOT01:%.*]] = fadd fast float [[MUL0]], [[TMP6]]			; CHECK-NEXT: [[DOT01:%.*]] = fadd fast float [[MUL0]], [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
	; CHECK-NEXT: [[DOT012:%.*]] = fadd fast float [[DOT01]], [[TMP7]]			; CHECK-NEXT: [[DOT012:%.*]] = fadd fast float [[DOT01]], [[TMP7]]
	; CHECK-NEXT: ret float [[DOT012]]			; CHECK-NEXT: ret float [[DOT012]]
	;			;
	%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1			%ptrx1 = getelementptr inbounds float, float* %ptrx, i64 1
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=i386-apple-macosx10.9.0 -mcpu=corei7-avx \| FileCheck %s		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=i386-apple-macosx10.9.0 -mcpu=corei7-avx \| FileCheck %s

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

@a = common global i64* null, align 8		@a = common global i64* null, align 8

; Function Attrs: nounwind ssp uwtable		; Function Attrs: nounwind ssp uwtable
define i32 @fn1() {		define i32 @fn1() {
; CHECK-LABEL: @fn1(		; CHECK-LABEL: @fn1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8		; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8
		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 12
; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 11, i64 56>		; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 11, i64 56>
; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>		; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 12
; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 0		; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
; CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 8		; CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 8
; CHECK-NEXT: ret i32 undef		; CHECK-NEXT: ret i32 undef
;		;
entry:		entry:
%0 = load i64, i64* @a, align 8		%0 = load i64, i64* @a, align 8
%add.ptr = getelementptr inbounds i64, i64* %0, i64 11		%add.ptr = getelementptr inbounds i64, i64* %0, i64 11
Show All 10 Lines
define void @fn2(i32* %a, i32* %b, float* %c) {		define void @fn2(i32* %a, i32* %b, float* %c) {
; CHECK-LABEL: @fn2(		; CHECK-LABEL: @fn2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 1		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 1
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 1		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 1
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A]], i32 2		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A]], i32 2
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[B]], i32 2		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[B]], i32 2
; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i32 3		; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i32 3
		; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[B]], i32 3
		; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[C:%.*]], i32 1
		; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[C]], i32 2
		; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[C]], i32 3
; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[A]] to <4 x i32>*
; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[B]], i32 3
; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*		; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[B]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>		; CHECK-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> [[TMP5]], i32 [[TMP6]])		; CHECK-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> [[TMP5]], i32 [[TMP6]])
; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[C:%.*]], i32 1
; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[C]], i32 2
; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[C]], i32 3
; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[C]] to <4 x float>*		; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[C]] to <4 x float>*
; CHECK-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4		; CHECK-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%i0 = load i32, i32* %a, align 4		%i0 = load i32, i32* %a, align 4
%i1 = load i32, i32* %b, align 4		%i1 = load i32, i32* %b, align 4
%add1 = add i32 %i0, %i1		%add1 = add i32 %i0, %i1
Show All 34 Lines	entry:
ret void		ret void

}		}

define void @externally_used_ptrs() {		define void @externally_used_ptrs() {
; CHECK-LABEL: @externally_used_ptrs(		; CHECK-LABEL: @externally_used_ptrs(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8		; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8
		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 12
; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1
; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 56, i64 11>		; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 56, i64 11>
; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>		; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 12
; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 1		; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 1
; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 8		; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 8
; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i64> [[TMP4]], [[TMP7]]		; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i64> [[TMP4]], [[TMP7]]
; CHECK-NEXT: [[TMP9:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*		; CHECK-NEXT: [[TMP9:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* [[TMP9]], align 8		; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* [[TMP9]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/fabs.ll

Show All 33 Lines	;
store double %fabs0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %fabs0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %fabs1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %fabs1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @fabs_4f64() #0 {		define void @fabs_4f64() #0 {
; SSE-LABEL: @fabs_4f64(		; SSE-LABEL: @fabs_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP1]])		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP3]])
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fabs_4f64(		; AVX-LABEL: @fabs_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP1]])
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %fabs2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %fabs2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %fabs3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %fabs3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @fabs_8f64() #0 {		define void @fabs_8f64() #0 {
; SSE-LABEL: @fabs_8f64(		; SSE-LABEL: @fabs_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP1]])
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP1]])		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP3]])
; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP2]])		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP3]])		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP4]])		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP5]])
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP7]])
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fabs_8f64(		; AVX256-LABEL: @fabs_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP1]])
; AVX256-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP1]])		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP2]])		; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[TMP3]])
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fabs_8f64(		; AVX512-LABEL: @fabs_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.fabs.v8f64(<8 x double> [[TMP1]])		; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.fabs.v8f64(<8 x double> [[TMP1]])
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	;
store float %fabs2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %fabs2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %fabs3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %fabs3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @fabs_8f32() #0 {		define void @fabs_8f32() #0 {
; SSE-LABEL: @fabs_8f32(		; SSE-LABEL: @fabs_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP1]])		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP3]])
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fabs_8f32(		; AVX-LABEL: @fabs_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP1]])
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %fabs6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %fabs6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %fabs7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %fabs7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @fabs_16f32() #0 {		define void @fabs_16f32() #0 {
; SSE-LABEL: @fabs_16f32(		; SSE-LABEL: @fabs_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP1]])
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP1]])		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP3]])
; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP2]])		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP3]])		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP4]])		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP5]])
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> [[TMP7]])
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fabs_16f32(		; AVX256-LABEL: @fabs_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP1]])
; AVX256-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP1]])		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP2]])		; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fabs.v8f32(<8 x float> [[TMP3]])
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fabs_16f32(		; AVX512-LABEL: @fabs_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.fabs.v16f32(<16 x float> [[TMP1]])		; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.fabs.v16f32(<16 x float> [[TMP1]])
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fcopysign.ll

Show All 40 Lines	;
store double %fcopysign0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %fcopysign0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %fcopysign1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %fcopysign1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @fcopysign_4f64() #0 {		define void @fcopysign_4f64() #0 {
; SSE-LABEL: @fcopysign_4f64(		; SSE-LABEL: @fcopysign_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fcopysign_4f64(		; AVX-LABEL: @fcopysign_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8		; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
Show All 16 Lines	;
store double %fcopysign2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %fcopysign2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %fcopysign3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %fcopysign3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @fcopysign_8f64() #0 {		define void @fcopysign_8f64() #0 {
; SSE-LABEL: @fcopysign_8f64(		; SSE-LABEL: @fcopysign_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP3]], <2 x double> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP8]])		; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP10]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP11]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.copysign.v2f64(<2 x double> [[TMP10]], <2 x double> [[TMP11]])
; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fcopysign_8f64(		; AVX256-LABEL: @fcopysign_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP2]], <4 x double> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP5]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.copysign.v4f64(<4 x double> [[TMP4]], <4 x double> [[TMP5]])
; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fcopysign_8f64(		; AVX512-LABEL: @fcopysign_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.copysign.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.copysign.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])
; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4		; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
store float %fcopysign2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %fcopysign2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %fcopysign3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %fcopysign3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @fcopysign_8f32() #0 {		define void @fcopysign_8f32() #0 {
; SSE-LABEL: @fcopysign_8f32(		; SSE-LABEL: @fcopysign_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fcopysign_8f32(		; AVX-LABEL: @fcopysign_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
Show All 32 Lines	;
store float %fcopysign6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %fcopysign6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %fcopysign7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %fcopysign7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @fcopysign_16f32() #0 {		define void @fcopysign_16f32() #0 {
; SSE-LABEL: @fcopysign_16f32(		; SSE-LABEL: @fcopysign_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP3]], <4 x float> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP8]])		; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP10]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP11]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.copysign.v4f32(<4 x float> [[TMP10]], <4 x float> [[TMP11]])
; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fcopysign_16f32(		; AVX256-LABEL: @fcopysign_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP2]], <8 x float> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP5]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.copysign.v8f32(<8 x float> [[TMP4]], <8 x float> [[TMP5]])
; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fcopysign_16f32(		; AVX512-LABEL: @fcopysign_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.copysign.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.copysign.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])
; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4		; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fma.ll

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; NO-FMA-NEXT: store double [[FMA4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 4			; NO-FMA-NEXT: store double [[FMA4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 4
	; NO-FMA-NEXT: store double [[FMA5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 4			; NO-FMA-NEXT: store double [[FMA5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 4
	; NO-FMA-NEXT: store double [[FMA6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 4			; NO-FMA-NEXT: store double [[FMA6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 4
	; NO-FMA-NEXT: store double [[FMA7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 4			; NO-FMA-NEXT: store double [[FMA7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 4
	; NO-FMA-NEXT: ret void			; NO-FMA-NEXT: ret void
	;			;
	; FMA256-LABEL: @fma_8f64(			; FMA256-LABEL: @fma_8f64(
	; FMA256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4			; FMA256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4			; FMA256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4			; FMA256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4			; FMA256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fma.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]], <4 x double> [[TMP3]])
	; FMA256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 4			; FMA256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP6:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <4 x double>*), align 4			; FMA256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP7:%.*]] = call <4 x double> @llvm.fma.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP3]], <4 x double> [[TMP5]])			; FMA256-NEXT: [[TMP6:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4
	; FMA256-NEXT: [[TMP8:%.*]] = call <4 x double> @llvm.fma.v4f64(<4 x double> [[TMP2]], <4 x double> [[TMP4]], <4 x double> [[TMP6]])			; FMA256-NEXT: [[TMP7:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <4 x double>*), align 4
	; FMA256-NEXT: store <4 x double> [[TMP7]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4			; FMA256-NEXT: [[TMP8:%.*]] = call <4 x double> @llvm.fma.v4f64(<4 x double> [[TMP5]], <4 x double> [[TMP6]], <4 x double> [[TMP7]])
	; FMA256-NEXT: store <4 x double> [[TMP8]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4			; FMA256-NEXT: store <4 x double> [[TMP8]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
	; FMA256-NEXT: ret void			; FMA256-NEXT: ret void
	;			;
	; FMA512-LABEL: @fma_8f64(			; FMA512-LABEL: @fma_8f64(
	; FMA512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4			; FMA512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4
	; FMA512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4			; FMA512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4
	; FMA512-NEXT: [[TMP3:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcC64 to <8 x double>*), align 4			; FMA512-NEXT: [[TMP3:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcC64 to <8 x double>*), align 4
	; FMA512-NEXT: [[TMP4:%.*]] = call <8 x double> @llvm.fma.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]], <8 x double> [[TMP3]])			; FMA512-NEXT: [[TMP4:%.*]] = call <8 x double> @llvm.fma.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]], <8 x double> [[TMP3]])
	▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	; NO-FMA-NEXT: store float [[FMA12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; NO-FMA-NEXT: store float [[FMA12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; NO-FMA-NEXT: store float [[FMA13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; NO-FMA-NEXT: store float [[FMA13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; NO-FMA-NEXT: store float [[FMA14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; NO-FMA-NEXT: store float [[FMA14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; NO-FMA-NEXT: store float [[FMA15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; NO-FMA-NEXT: store float [[FMA15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; NO-FMA-NEXT: ret void			; NO-FMA-NEXT: ret void
	;			;
	; FMA256-LABEL: @fma_16f32(			; FMA256-LABEL: @fma_16f32(
	; FMA256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4			; FMA256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4			; FMA256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4			; FMA256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4			; FMA256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fma.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x float> [[TMP3]])
	; FMA256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4			; FMA256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP6:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <8 x float>*), align 4			; FMA256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP7:%.*]] = call <8 x float> @llvm.fma.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP3]], <8 x float> [[TMP5]])			; FMA256-NEXT: [[TMP6:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4
	; FMA256-NEXT: [[TMP8:%.*]] = call <8 x float> @llvm.fma.v8f32(<8 x float> [[TMP2]], <8 x float> [[TMP4]], <8 x float> [[TMP6]])			; FMA256-NEXT: [[TMP7:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <8 x float>*), align 4
	; FMA256-NEXT: store <8 x float> [[TMP7]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; FMA256-NEXT: [[TMP8:%.*]] = call <8 x float> @llvm.fma.v8f32(<8 x float> [[TMP5]], <8 x float> [[TMP6]], <8 x float> [[TMP7]])
	; FMA256-NEXT: store <8 x float> [[TMP8]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; FMA256-NEXT: store <8 x float> [[TMP8]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; FMA256-NEXT: ret void			; FMA256-NEXT: ret void
	;			;
	; FMA512-LABEL: @fma_16f32(			; FMA512-LABEL: @fma_16f32(
	; FMA512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4			; FMA512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4
	; FMA512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4			; FMA512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4
	; FMA512-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcC32 to <16 x float>*), align 4			; FMA512-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcC32 to <16 x float>*), align 4
	; FMA512-NEXT: [[TMP4:%.*]] = call <16 x float> @llvm.fma.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]], <16 x float> [[TMP3]])			; FMA512-NEXT: [[TMP4:%.*]] = call <16 x float> @llvm.fma.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]], <16 x float> [[TMP3]])
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fmaxnum.ll

Show All 38 Lines	;
store double %fmaxnum0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %fmaxnum0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %fmaxnum1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %fmaxnum1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @fmaxnum_4f64() #0 {		define void @fmaxnum_4f64() #0 {
; SSE-LABEL: @fmaxnum_4f64(		; SSE-LABEL: @fmaxnum_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fmaxnum_4f64(		; AVX-LABEL: @fmaxnum_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8		; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
Show All 16 Lines	;
store double %fmaxnum2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %fmaxnum2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %fmaxnum3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %fmaxnum3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @fmaxnum_8f64() #0 {		define void @fmaxnum_8f64() #0 {
; SSE-LABEL: @fmaxnum_8f64(		; SSE-LABEL: @fmaxnum_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP3]], <2 x double> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP8]])		; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP10]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP11]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.maxnum.v2f64(<2 x double> [[TMP10]], <2 x double> [[TMP11]])
; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fmaxnum_8f64(		; AVX256-LABEL: @fmaxnum_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP2]], <4 x double> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP5]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.maxnum.v4f64(<4 x double> [[TMP4]], <4 x double> [[TMP5]])
; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fmaxnum_8f64(		; AVX512-LABEL: @fmaxnum_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.maxnum.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.maxnum.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])
; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4		; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
store float %fmaxnum2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %fmaxnum2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %fmaxnum3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %fmaxnum3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @fmaxnum_8f32() #0 {		define void @fmaxnum_8f32() #0 {
; SSE-LABEL: @fmaxnum_8f32(		; SSE-LABEL: @fmaxnum_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fmaxnum_8f32(		; AVX-LABEL: @fmaxnum_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
Show All 32 Lines	;
store float %fmaxnum6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %fmaxnum6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %fmaxnum7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %fmaxnum7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @fmaxnum_16f32() #0 {		define void @fmaxnum_16f32() #0 {
; SSE-LABEL: @fmaxnum_16f32(		; SSE-LABEL: @fmaxnum_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP3]], <4 x float> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP8]])		; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP10]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP11]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[TMP10]], <4 x float> [[TMP11]])
; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fmaxnum_16f32(		; AVX256-LABEL: @fmaxnum_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP2]], <8 x float> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP5]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.maxnum.v8f32(<8 x float> [[TMP4]], <8 x float> [[TMP5]])
; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fmaxnum_16f32(		; AVX512-LABEL: @fmaxnum_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.maxnum.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.maxnum.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])
; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4		; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fminnum.ll

Show All 38 Lines	;
store double %fminnum0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %fminnum0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %fminnum1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %fminnum1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @fminnum_4f64() #0 {		define void @fminnum_4f64() #0 {
; SSE-LABEL: @fminnum_4f64(		; SSE-LABEL: @fminnum_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fminnum_4f64(		; AVX-LABEL: @fminnum_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8		; AVX-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
Show All 16 Lines	;
store double %fminnum2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %fminnum2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %fminnum3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %fminnum3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @fminnum_8f64() #0 {		define void @fminnum_8f64() #0 {
; SSE-LABEL: @fminnum_8f64(		; SSE-LABEL: @fminnum_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP3]], <2 x double> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP8]])		; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP9]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP10]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP11]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.minnum.v2f64(<2 x double> [[TMP10]], <2 x double> [[TMP11]])
; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fminnum_8f64(		; AVX256-LABEL: @fminnum_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP2]], <4 x double> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP5]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <4 x double> @llvm.minnum.v4f64(<4 x double> [[TMP4]], <4 x double> [[TMP5]])
; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP6]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fminnum_8f64(		; AVX512-LABEL: @fminnum_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.minnum.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <8 x double> @llvm.minnum.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]])
; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4		; AVX512-NEXT: store <8 x double> [[TMP3]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
store float %fminnum2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %fminnum2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %fminnum3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %fminnum3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @fminnum_8f32() #0 {		define void @fminnum_8f32() #0 {
; SSE-LABEL: @fminnum_8f32(		; SSE-LABEL: @fminnum_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP3]])		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP4]])		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fminnum_8f32(		; AVX-LABEL: @fminnum_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])		; AVX-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
Show All 32 Lines	;
store float %fminnum6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %fminnum6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %fminnum7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %fminnum7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @fminnum_16f32() #0 {		define void @fminnum_16f32() #0 {
; SSE-LABEL: @fminnum_16f32(		; SSE-LABEL: @fminnum_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]])
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP5]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP6]])		; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP3]], <4 x float> [[TMP7]])		; SSE-NEXT: [[TMP9:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]])
; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP8]])		; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP9]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP10]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP11]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.minnum.v4f32(<4 x float> [[TMP10]], <4 x float> [[TMP11]])
; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fminnum_16f32(		; AVX256-LABEL: @fminnum_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]])
; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP5:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP3]])		; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP2]], <8 x float> [[TMP4]])		; AVX256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP5]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP6:%.*]] = call <8 x float> @llvm.minnum.v8f32(<8 x float> [[TMP4]], <8 x float> [[TMP5]])
; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP6]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fminnum_16f32(		; AVX512-LABEL: @fminnum_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.minnum.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])		; AVX512-NEXT: [[TMP3:%.*]] = call <16 x float> @llvm.minnum.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]])
; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4		; AVX512-NEXT: store <16 x float> [[TMP3]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fmuladd.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	;
store double %fmuladd0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %fmuladd0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %fmuladd1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %fmuladd1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @fmuladd_4f64() #0 {		define void @fmuladd_4f64() #0 {
; SSE-LABEL: @fmuladd_4f64(		; SSE-LABEL: @fmuladd_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcC64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x double> [[TMP3]])
; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcC64 to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP3]], <2 x double> [[TMP5]])		; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP4]], <2 x double> [[TMP6]])		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP5]], <2 x double> [[TMP6]], <2 x double> [[TMP7]])
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fmuladd_4f64(		; AVX-LABEL: @fmuladd_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]], <4 x double> [[TMP3]])		; AVX-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]], <4 x double> [[TMP3]])
Show All 21 Lines	;
store double %fmuladd2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %fmuladd2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %fmuladd3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %fmuladd3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @fmuladd_8f64() #0 {		define void @fmuladd_8f64() #0 {
; SSE-LABEL: @fmuladd_8f64(		; SSE-LABEL: @fmuladd_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcA64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcC64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x double> [[TMP3]])
; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcB64 to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP8:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP5]], <2 x double> [[TMP6]], <2 x double> [[TMP7]])
; SSE-NEXT: [[TMP9:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @srcC64 to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP9:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <2 x double>*), align 4
		; SSE-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP11:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP12:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP9]], <2 x double> [[TMP10]], <2 x double> [[TMP11]])
; SSE-NEXT: [[TMP13:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP1]], <2 x double> [[TMP5]], <2 x double> [[TMP9]])		; SSE-NEXT: store <2 x double> [[TMP12]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP14:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP2]], <2 x double> [[TMP6]], <2 x double> [[TMP10]])		; SSE-NEXT: [[TMP13:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: [[TMP15:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP3]], <2 x double> [[TMP7]], <2 x double> [[TMP11]])		; SSE-NEXT: [[TMP14:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: [[TMP16:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP4]], <2 x double> [[TMP8]], <2 x double> [[TMP12]])		; SSE-NEXT: [[TMP15:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP13]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP16:%.*]] = call <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[TMP13]], <2 x double> [[TMP14]], <2 x double> [[TMP15]])
; SSE-NEXT: store <2 x double> [[TMP14]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP15]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP16]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP16]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fmuladd_8f64(		; AVX256-LABEL: @fmuladd_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcA64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcB64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP4:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP2]], <4 x double> [[TMP3]])
; AVX256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @srcC64 to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP6:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP5:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcA64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: [[TMP7:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP1]], <4 x double> [[TMP3]], <4 x double> [[TMP5]])		; AVX256-NEXT: [[TMP6:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcB64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: [[TMP8:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP2]], <4 x double> [[TMP4]], <4 x double> [[TMP6]])		; AVX256-NEXT: [[TMP7:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @srcC64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP7]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP8:%.*]] = call <4 x double> @llvm.fmuladd.v4f64(<4 x double> [[TMP5]], <4 x double> [[TMP6]], <4 x double> [[TMP7]])
; AVX256-NEXT: store <4 x double> [[TMP8]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP8]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fmuladd_8f64(		; AVX512-LABEL: @fmuladd_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcA64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcB64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP3:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcC64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP3:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @srcC64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP4:%.*]] = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]], <8 x double> [[TMP3]])		; AVX512-NEXT: [[TMP4:%.*]] = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> [[TMP1]], <8 x double> [[TMP2]], <8 x double> [[TMP3]])
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	;
store float %fmuladd2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %fmuladd2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %fmuladd3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %fmuladd3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @fmuladd_8f32() #0 {		define void @fmuladd_8f32() #0 {
; SSE-LABEL: @fmuladd_8f32(		; SSE-LABEL: @fmuladd_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcC32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x float> [[TMP3]])
; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcC32 to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP3]], <4 x float> [[TMP5]])		; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP4]], <4 x float> [[TMP6]])		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x float> [[TMP7]])
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fmuladd_8f32(		; AVX-LABEL: @fmuladd_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x float> [[TMP3]])		; AVX-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x float> [[TMP3]])
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
store float %fmuladd6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %fmuladd6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %fmuladd7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %fmuladd7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @fmuladd_16f32() #0 {		define void @fmuladd_16f32() #0 {
; SSE-LABEL: @fmuladd_16f32(		; SSE-LABEL: @fmuladd_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcA32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcC32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP2]], <4 x float> [[TMP3]])
; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcB32 to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP6:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP8:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x float> [[TMP7]])
; SSE-NEXT: [[TMP9:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @srcC32 to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP9:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <4 x float>*), align 4
		; SSE-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP11:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP12:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP9]], <4 x float> [[TMP10]], <4 x float> [[TMP11]])
; SSE-NEXT: [[TMP13:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP1]], <4 x float> [[TMP5]], <4 x float> [[TMP9]])		; SSE-NEXT: store <4 x float> [[TMP12]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP14:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP2]], <4 x float> [[TMP6]], <4 x float> [[TMP10]])		; SSE-NEXT: [[TMP13:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: [[TMP15:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP3]], <4 x float> [[TMP7]], <4 x float> [[TMP11]])		; SSE-NEXT: [[TMP14:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: [[TMP16:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP4]], <4 x float> [[TMP8]], <4 x float> [[TMP12]])		; SSE-NEXT: [[TMP15:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP13]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP16:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP13]], <4 x float> [[TMP14]], <4 x float> [[TMP15]])
; SSE-NEXT: store <4 x float> [[TMP14]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP15]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP16]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP16]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @fmuladd_16f32(		; AVX256-LABEL: @fmuladd_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcA32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcB32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP4:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x float> [[TMP3]])
; AVX256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @srcC32 to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP6:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcA32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: [[TMP7:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP1]], <8 x float> [[TMP3]], <8 x float> [[TMP5]])		; AVX256-NEXT: [[TMP6:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcB32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: [[TMP8:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP2]], <8 x float> [[TMP4]], <8 x float> [[TMP6]])		; AVX256-NEXT: [[TMP7:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @srcC32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP7]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP8:%.*]] = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> [[TMP5]], <8 x float> [[TMP6]], <8 x float> [[TMP7]])
; AVX256-NEXT: store <8 x float> [[TMP8]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP8]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @fmuladd_16f32(		; AVX512-LABEL: @fmuladd_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcA32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcB32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcC32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @srcC32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP4:%.*]] = call <16 x float> @llvm.fmuladd.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]], <16 x float> [[TMP3]])		; AVX512-NEXT: [[TMP4:%.*]] = call <16 x float> @llvm.fmuladd.v16f32(<16 x float> [[TMP1]], <16 x float> [[TMP2]], <16 x float> [[TMP3]])
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @fptosi_8f64_8i64(		; AVX512-LABEL: @fptosi_8f64_8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i64>		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i64>
; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptosi_8f64_8i64(		; AVX256DQ-LABEL: @fptosi_8f64_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptosi_8f64_8i32() #0 {		define void @fptosi_8f64_8i32() #0 {
; SSE-LABEL: @fptosi_8f64_8i32(		; SSE-LABEL: @fptosi_8f64_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptosi_8f64_8i32(		; AVX-LABEL: @fptosi_8f64_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @fptosi_8f32_8i64(		; AVX512-LABEL: @fptosi_8f32_8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i64>		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i64>
; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptosi_8f32_8i64(		; AVX256DQ-LABEL: @fptosi_8f32_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptosi_8f32_8i32() #0 {		define void @fptosi_8f32_8i32() #0 {
; SSE-LABEL: @fptosi_8f32_8i32(		; SSE-LABEL: @fptosi_8f32_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptosi_8f32_8i32(		; AVX-LABEL: @fptosi_8f32_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @fptosi_8f64_8i64(		; AVX512-LABEL: @fptosi_8f64_8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i64>		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i64>
; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptosi_8f64_8i64(		; AVX256DQ-LABEL: @fptosi_8f64_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptosi_8f64_8i32() #0 {		define void @fptosi_8f64_8i32() #0 {
; SSE-LABEL: @fptosi_8f64_8i32(		; SSE-LABEL: @fptosi_8f64_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptosi <4 x double> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x double> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptosi_8f64_8i32(		; AVX-LABEL: @fptosi_8f64_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @fptosi_8f32_8i64(		; AVX512-LABEL: @fptosi_8f32_8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i64>		; AVX512-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i64>
; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptosi_8f32_8i64(		; AVX256DQ-LABEL: @fptosi_8f32_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptosi_8f32_8i32() #0 {		define void @fptosi_8f32_8i32() #0 {
; SSE-LABEL: @fptosi_8f32_8i32(		; SSE-LABEL: @fptosi_8f32_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptosi <4 x float> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptosi <4 x float> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptosi_8f32_8i32(		; AVX-LABEL: @fptosi_8f32_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
; AVX512F-LABEL: @fptoui_8f64_8i64(		; AVX512F-LABEL: @fptoui_8f64_8i64(
; AVX512F-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX512F-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i64>		; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i64>
; AVX512F-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512F-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptoui_8f64_8i64(		; AVX256DQ-LABEL: @fptoui_8f64_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8		%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8		%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8		%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8		%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8		%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptoui_8f64_8i32() #0 {		define void @fptoui_8f64_8i32() #0 {
; SSE-LABEL: @fptoui_8f64_8i32(		; SSE-LABEL: @fptoui_8f64_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptoui_8f64_8i32(		; AVX-LABEL: @fptoui_8f64_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
; AVX512F-LABEL: @fptoui_8f32_8i64(		; AVX512F-LABEL: @fptoui_8f32_8i64(
; AVX512F-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX512F-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i64>		; AVX512F-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i64>
; AVX512F-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8		; AVX512F-NEXT: store <8 x i64> [[TMP2]], <8 x i64>* bitcast ([8 x i64]* @dst64 to <8 x i64>*), align 8
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX256DQ-LABEL: @fptoui_8f32_8i64(		; AVX256DQ-LABEL: @fptoui_8f32_8i64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; AVX256DQ-NEXT: [[TMP2:%.*]] = fptoui <4 x float> [[TMP1]] to <4 x i64>
; AVX256DQ-NEXT: [[TMP3:%.*]] = fptoui <4 x float> [[TMP1]] to <4 x i64>		; AVX256DQ-NEXT: store <4 x i64> [[TMP2]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8
; AVX256DQ-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP2]] to <4 x i64>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; AVX256DQ-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @dst64 to <4 x i64>*), align 8		; AVX256DQ-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP3]] to <4 x i64>
; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX256DQ-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4		%a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4		%a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4		%a2 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4		%a3 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4		%a4 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
Show All 17 Lines	;
store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8		store i64 %cvt6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 6), align 8
store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8		store i64 %cvt7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @dst64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @fptoui_8f32_8i32() #0 {		define void @fptoui_8f32_8i32() #0 {
; SSE-LABEL: @fptoui_8f32_8i32(		; SSE-LABEL: @fptoui_8f32_8i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = fptoui <4 x float> [[TMP1]] to <4 x i32>
; SSE-NEXT: [[TMP3:%.*]] = fptoui <4 x float> [[TMP1]] to <4 x i32>		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP2]] to <4 x i32>		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x float> [[TMP3]] to <4 x i32>
; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @fptoui_8f32_8i32(		; AVX-LABEL: @fptoui_8f32_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i32>		; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fround.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store double [[CEIL0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8			; SSE2-NEXT: store double [[CEIL0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
	; SSE2-NEXT: store double [[CEIL1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8			; SSE2-NEXT: store double [[CEIL1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
	; SSE2-NEXT: store double [[CEIL2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8			; SSE2-NEXT: store double [[CEIL2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
	; SSE2-NEXT: store double [[CEIL3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8			; SSE2-NEXT: store double [[CEIL3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @ceil_4f64(			; SSE41-LABEL: @ceil_4f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @ceil_4f64(			; AVX-LABEL: @ceil_4f64(
	; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])
	; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	Show All 38 Lines
	; SSE2-NEXT: store double [[CEIL4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8			; SSE2-NEXT: store double [[CEIL4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8
	; SSE2-NEXT: store double [[CEIL5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8			; SSE2-NEXT: store double [[CEIL5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8
	; SSE2-NEXT: store double [[CEIL6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8			; SSE2-NEXT: store double [[CEIL6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8
	; SSE2-NEXT: store double [[CEIL7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8			; SSE2-NEXT: store double [[CEIL7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @ceil_8f64(			; SSE41-LABEL: @ceil_8f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP5]])
	; SSE41-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP7]])
	; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @ceil_8f64(			; AVX1-LABEL: @ceil_8f64(
	; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])			; AVX1-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP3]])
	; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @ceil_8f64(			; AVX2-LABEL: @ceil_8f64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP1]])			; AVX2-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.ceil.v4f64(<4 x double> [[TMP3]])
	; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @ceil_8f64(			; AVX512-LABEL: @ceil_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.ceil.v8f64(<8 x double> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.ceil.v8f64(<8 x double> [[TMP1]])
	; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8			; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store double [[FLOOR0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8			; SSE2-NEXT: store double [[FLOOR0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
	; SSE2-NEXT: store double [[FLOOR1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8			; SSE2-NEXT: store double [[FLOOR1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
	; SSE2-NEXT: store double [[FLOOR2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8			; SSE2-NEXT: store double [[FLOOR2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
	; SSE2-NEXT: store double [[FLOOR3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8			; SSE2-NEXT: store double [[FLOOR3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @floor_4f64(			; SSE41-LABEL: @floor_4f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @floor_4f64(			; AVX-LABEL: @floor_4f64(
	; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])
	; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	Show All 38 Lines
	; SSE2-NEXT: store double [[FLOOR4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8			; SSE2-NEXT: store double [[FLOOR4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8
	; SSE2-NEXT: store double [[FLOOR5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8			; SSE2-NEXT: store double [[FLOOR5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8
	; SSE2-NEXT: store double [[FLOOR6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8			; SSE2-NEXT: store double [[FLOOR6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8
	; SSE2-NEXT: store double [[FLOOR7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8			; SSE2-NEXT: store double [[FLOOR7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @floor_8f64(			; SSE41-LABEL: @floor_8f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP5]])
	; SSE41-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP7]])
	; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @floor_8f64(			; AVX1-LABEL: @floor_8f64(
	; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])			; AVX1-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP3]])
	; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @floor_8f64(			; AVX2-LABEL: @floor_8f64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP1]])			; AVX2-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.floor.v4f64(<4 x double> [[TMP3]])
	; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @floor_8f64(			; AVX512-LABEL: @floor_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.floor.v8f64(<8 x double> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.floor.v8f64(<8 x double> [[TMP1]])
	; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8			; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store double [[NEARBYINT0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8			; SSE2-NEXT: store double [[NEARBYINT0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
	; SSE2-NEXT: store double [[NEARBYINT1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8			; SSE2-NEXT: store double [[NEARBYINT1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
	; SSE2-NEXT: store double [[NEARBYINT2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8			; SSE2-NEXT: store double [[NEARBYINT2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
	; SSE2-NEXT: store double [[NEARBYINT3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8			; SSE2-NEXT: store double [[NEARBYINT3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @nearbyint_4f64(			; SSE41-LABEL: @nearbyint_4f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @nearbyint_4f64(			; AVX-LABEL: @nearbyint_4f64(
	; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])
	; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	Show All 38 Lines
	; SSE2-NEXT: store double [[NEARBYINT4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8			; SSE2-NEXT: store double [[NEARBYINT4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8
	; SSE2-NEXT: store double [[NEARBYINT5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8			; SSE2-NEXT: store double [[NEARBYINT5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8
	; SSE2-NEXT: store double [[NEARBYINT6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8			; SSE2-NEXT: store double [[NEARBYINT6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8
	; SSE2-NEXT: store double [[NEARBYINT7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8			; SSE2-NEXT: store double [[NEARBYINT7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @nearbyint_8f64(			; SSE41-LABEL: @nearbyint_8f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP5]])
	; SSE41-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP7]])
	; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @nearbyint_8f64(			; AVX1-LABEL: @nearbyint_8f64(
	; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])			; AVX1-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP3]])
	; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @nearbyint_8f64(			; AVX2-LABEL: @nearbyint_8f64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP1]])			; AVX2-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.nearbyint.v4f64(<4 x double> [[TMP3]])
	; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @nearbyint_8f64(			; AVX512-LABEL: @nearbyint_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.nearbyint.v8f64(<8 x double> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.nearbyint.v8f64(<8 x double> [[TMP1]])
	; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8			; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store double [[RINT0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8			; SSE2-NEXT: store double [[RINT0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
	; SSE2-NEXT: store double [[RINT1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8			; SSE2-NEXT: store double [[RINT1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
	; SSE2-NEXT: store double [[RINT2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8			; SSE2-NEXT: store double [[RINT2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
	; SSE2-NEXT: store double [[RINT3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8			; SSE2-NEXT: store double [[RINT3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @rint_4f64(			; SSE41-LABEL: @rint_4f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @rint_4f64(			; AVX-LABEL: @rint_4f64(
	; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])
	; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	Show All 38 Lines
	; SSE2-NEXT: store double [[RINT4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8			; SSE2-NEXT: store double [[RINT4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8
	; SSE2-NEXT: store double [[RINT5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8			; SSE2-NEXT: store double [[RINT5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8
	; SSE2-NEXT: store double [[RINT6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8			; SSE2-NEXT: store double [[RINT6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8
	; SSE2-NEXT: store double [[RINT7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8			; SSE2-NEXT: store double [[RINT7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @rint_8f64(			; SSE41-LABEL: @rint_8f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP5]])
	; SSE41-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP7]])
	; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @rint_8f64(			; AVX1-LABEL: @rint_8f64(
	; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])			; AVX1-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP3]])
	; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @rint_8f64(			; AVX2-LABEL: @rint_8f64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP1]])			; AVX2-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.rint.v4f64(<4 x double> [[TMP3]])
	; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @rint_8f64(			; AVX512-LABEL: @rint_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.rint.v8f64(<8 x double> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.rint.v8f64(<8 x double> [[TMP1]])
	; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8			; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store double [[TRUNC0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8			; SSE2-NEXT: store double [[TRUNC0]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
	; SSE2-NEXT: store double [[TRUNC1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8			; SSE2-NEXT: store double [[TRUNC1]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
	; SSE2-NEXT: store double [[TRUNC2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8			; SSE2-NEXT: store double [[TRUNC2]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
	; SSE2-NEXT: store double [[TRUNC3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8			; SSE2-NEXT: store double [[TRUNC3]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @trunc_4f64(			; SSE41-LABEL: @trunc_4f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @trunc_4f64(			; AVX-LABEL: @trunc_4f64(
	; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])
	; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	Show All 38 Lines
	; SSE2-NEXT: store double [[TRUNC4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8			; SSE2-NEXT: store double [[TRUNC4]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4), align 8
	; SSE2-NEXT: store double [[TRUNC5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8			; SSE2-NEXT: store double [[TRUNC5]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 5), align 8
	; SSE2-NEXT: store double [[TRUNC6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8			; SSE2-NEXT: store double [[TRUNC6]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6), align 8
	; SSE2-NEXT: store double [[TRUNC7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8			; SSE2-NEXT: store double [[TRUNC7]], double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 7), align 8
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @trunc_8f64(			; SSE41-LABEL: @trunc_8f64(
	; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8			; SSE41-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP2]])			; SSE41-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP5]])
	; SSE41-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 8			; SSE41-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP7]])
	; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8			; SSE41-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 8
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @trunc_8f64(			; AVX1-LABEL: @trunc_8f64(
	; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])			; AVX1-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX1-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP3]])
	; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX1-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @trunc_8f64(			; AVX2-LABEL: @trunc_8f64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP1]])			; AVX2-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
	; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8			; AVX2-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.trunc.v4f64(<4 x double> [[TMP3]])
	; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8			; AVX2-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @trunc_8f64(			; AVX512-LABEL: @trunc_8f64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.trunc.v8f64(<8 x double> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.trunc.v8f64(<8 x double> [[TMP1]])
	; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8			; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[CEIL4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4			; SSE2-NEXT: store float [[CEIL4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4
	; SSE2-NEXT: store float [[CEIL5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4			; SSE2-NEXT: store float [[CEIL5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
	; SSE2-NEXT: store float [[CEIL6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4			; SSE2-NEXT: store float [[CEIL6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
	; SSE2-NEXT: store float [[CEIL7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4			; SSE2-NEXT: store float [[CEIL7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @ceil_8f32(			; SSE41-LABEL: @ceil_8f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @ceil_8f32(			; AVX-LABEL: @ceil_8f32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])
	; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[CEIL12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; SSE2-NEXT: store float [[CEIL12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; SSE2-NEXT: store float [[CEIL13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; SSE2-NEXT: store float [[CEIL13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; SSE2-NEXT: store float [[CEIL14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; SSE2-NEXT: store float [[CEIL14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; SSE2-NEXT: store float [[CEIL15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; SSE2-NEXT: store float [[CEIL15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @ceil_16f32(			; SSE41-LABEL: @ceil_16f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP5]])
	; SSE41-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.ceil.v4f32(<4 x float> [[TMP7]])
	; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @ceil_16f32(			; AVX1-LABEL: @ceil_16f32(
	; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])			; AVX1-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP3]])
	; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @ceil_16f32(			; AVX2-LABEL: @ceil_16f32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP1]])			; AVX2-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.ceil.v8f32(<8 x float> [[TMP3]])
	; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @ceil_16f32(			; AVX512-LABEL: @ceil_16f32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
	; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.ceil.v16f32(<16 x float> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.ceil.v16f32(<16 x float> [[TMP1]])
	; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4			; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[FLOOR4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4			; SSE2-NEXT: store float [[FLOOR4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4
	; SSE2-NEXT: store float [[FLOOR5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4			; SSE2-NEXT: store float [[FLOOR5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
	; SSE2-NEXT: store float [[FLOOR6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4			; SSE2-NEXT: store float [[FLOOR6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
	; SSE2-NEXT: store float [[FLOOR7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4			; SSE2-NEXT: store float [[FLOOR7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @floor_8f32(			; SSE41-LABEL: @floor_8f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @floor_8f32(			; AVX-LABEL: @floor_8f32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])
	; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[FLOOR12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; SSE2-NEXT: store float [[FLOOR12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; SSE2-NEXT: store float [[FLOOR13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; SSE2-NEXT: store float [[FLOOR13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; SSE2-NEXT: store float [[FLOOR14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; SSE2-NEXT: store float [[FLOOR14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; SSE2-NEXT: store float [[FLOOR15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; SSE2-NEXT: store float [[FLOOR15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @floor_16f32(			; SSE41-LABEL: @floor_16f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP5]])
	; SSE41-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.floor.v4f32(<4 x float> [[TMP7]])
	; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @floor_16f32(			; AVX1-LABEL: @floor_16f32(
	; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])			; AVX1-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP3]])
	; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @floor_16f32(			; AVX2-LABEL: @floor_16f32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP1]])			; AVX2-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.floor.v8f32(<8 x float> [[TMP3]])
	; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @floor_16f32(			; AVX512-LABEL: @floor_16f32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
	; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.floor.v16f32(<16 x float> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.floor.v16f32(<16 x float> [[TMP1]])
	; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4			; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[NEARBYINT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4			; SSE2-NEXT: store float [[NEARBYINT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4
	; SSE2-NEXT: store float [[NEARBYINT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4			; SSE2-NEXT: store float [[NEARBYINT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
	; SSE2-NEXT: store float [[NEARBYINT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4			; SSE2-NEXT: store float [[NEARBYINT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
	; SSE2-NEXT: store float [[NEARBYINT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4			; SSE2-NEXT: store float [[NEARBYINT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @nearbyint_8f32(			; SSE41-LABEL: @nearbyint_8f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @nearbyint_8f32(			; AVX-LABEL: @nearbyint_8f32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])
	; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[NEARBYINT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; SSE2-NEXT: store float [[NEARBYINT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; SSE2-NEXT: store float [[NEARBYINT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; SSE2-NEXT: store float [[NEARBYINT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; SSE2-NEXT: store float [[NEARBYINT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; SSE2-NEXT: store float [[NEARBYINT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; SSE2-NEXT: store float [[NEARBYINT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; SSE2-NEXT: store float [[NEARBYINT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @nearbyint_16f32(			; SSE41-LABEL: @nearbyint_16f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP5]])
	; SSE41-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[TMP7]])
	; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @nearbyint_16f32(			; AVX1-LABEL: @nearbyint_16f32(
	; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])			; AVX1-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP3]])
	; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @nearbyint_16f32(			; AVX2-LABEL: @nearbyint_16f32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP1]])			; AVX2-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.nearbyint.v8f32(<8 x float> [[TMP3]])
	; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @nearbyint_16f32(			; AVX512-LABEL: @nearbyint_16f32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
	; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.nearbyint.v16f32(<16 x float> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.nearbyint.v16f32(<16 x float> [[TMP1]])
	; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4			; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[RINT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4			; SSE2-NEXT: store float [[RINT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4
	; SSE2-NEXT: store float [[RINT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4			; SSE2-NEXT: store float [[RINT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
	; SSE2-NEXT: store float [[RINT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4			; SSE2-NEXT: store float [[RINT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
	; SSE2-NEXT: store float [[RINT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4			; SSE2-NEXT: store float [[RINT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @rint_8f32(			; SSE41-LABEL: @rint_8f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @rint_8f32(			; AVX-LABEL: @rint_8f32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])
	; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[RINT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; SSE2-NEXT: store float [[RINT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; SSE2-NEXT: store float [[RINT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; SSE2-NEXT: store float [[RINT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; SSE2-NEXT: store float [[RINT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; SSE2-NEXT: store float [[RINT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; SSE2-NEXT: store float [[RINT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; SSE2-NEXT: store float [[RINT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @rint_16f32(			; SSE41-LABEL: @rint_16f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP5]])
	; SSE41-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.rint.v4f32(<4 x float> [[TMP7]])
	; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @rint_16f32(			; AVX1-LABEL: @rint_16f32(
	; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])			; AVX1-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP3]])
	; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @rint_16f32(			; AVX2-LABEL: @rint_16f32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP1]])			; AVX2-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.rint.v8f32(<8 x float> [[TMP3]])
	; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @rint_16f32(			; AVX512-LABEL: @rint_16f32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
	; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.rint.v16f32(<16 x float> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.rint.v16f32(<16 x float> [[TMP1]])
	; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4			; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[TRUNC4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4			; SSE2-NEXT: store float [[TRUNC4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 4
	; SSE2-NEXT: store float [[TRUNC5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4			; SSE2-NEXT: store float [[TRUNC5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
	; SSE2-NEXT: store float [[TRUNC6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4			; SSE2-NEXT: store float [[TRUNC6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
	; SSE2-NEXT: store float [[TRUNC7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4			; SSE2-NEXT: store float [[TRUNC7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @trunc_8f32(			; SSE41-LABEL: @trunc_8f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX-LABEL: @trunc_8f32(			; AVX-LABEL: @trunc_8f32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])
	; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: store float [[TRUNC12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4			; SSE2-NEXT: store float [[TRUNC12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 4
	; SSE2-NEXT: store float [[TRUNC13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4			; SSE2-NEXT: store float [[TRUNC13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
	; SSE2-NEXT: store float [[TRUNC14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4			; SSE2-NEXT: store float [[TRUNC14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 4
	; SSE2-NEXT: store float [[TRUNC15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4			; SSE2-NEXT: store float [[TRUNC15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
	; SSE2-NEXT: ret void			; SSE2-NEXT: ret void
	;			;
	; SSE41-LABEL: @trunc_16f32(			; SSE41-LABEL: @trunc_16f32(
	; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4			; SSE41-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP1]])
	; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP1]])			; SSE41-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP3]])
	; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP2]])			; SSE41-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP3]])			; SSE41-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP4]])			; SSE41-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP5]])
	; SSE41-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4			; SSE41-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.trunc.v4f32(<4 x float> [[TMP7]])
	; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4			; SSE41-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
	; SSE41-NEXT: ret void			; SSE41-NEXT: ret void
	;			;
	; AVX1-LABEL: @trunc_16f32(			; AVX1-LABEL: @trunc_16f32(
	; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])
	; AVX1-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])			; AVX1-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP2]])			; AVX1-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX1-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP3]])
	; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX1-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
	; AVX2-LABEL: @trunc_16f32(			; AVX2-LABEL: @trunc_16f32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])
	; AVX2-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP1]])			; AVX2-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4			; AVX2-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.trunc.v8f32(<8 x float> [[TMP3]])
	; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4			; AVX2-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @trunc_16f32(			; AVX512-LABEL: @trunc_16f32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
	; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.trunc.v16f32(<16 x float> [[TMP1]])			; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.trunc.v16f32(<16 x float> [[TMP1]])
	; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4			; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/funclet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer < %s \| FileCheck %s
	target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"			target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
	target triple = "i686-pc-windows-msvc18.0.0"			target triple = "i686-pc-windows-msvc18.0.0"

	define void @test1(double* %a, double* %b, double* %c) #0 personality i32 (...)* @__CxxFrameHandler3 {			define void @test1(double* %a, double* %b, double* %c) #0 personality i32 (...)* @__CxxFrameHandler3 {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: invoke void @_CxxThrowException(i8* null, i8* null)			; CHECK-NEXT: invoke void @_CxxThrowException(i8* null, i8* null)
	; CHECK-NEXT: to label [[UNREACHABLE:%.]] unwind label [[CATCH_DISPATCH:%.]]			; CHECK-NEXT: to label [[UNREACHABLE:%.]] unwind label [[CATCH_DISPATCH:%.]]
	; CHECK: catch.dispatch:			; CHECK: catch.dispatch:
	; CHECK-NEXT: [[TMP0:%.*]] = catchswitch within none [label %catch] unwind to caller			; CHECK-NEXT: [[TMP0:%.*]] = catchswitch within none [label %catch] unwind to caller
	; CHECK: catch:			; CHECK: catch:
	; CHECK-NEXT: [[TMP1:%.]] = catchpad within [[TMP0]] [i8 null, i32 64, i8* null]			; CHECK-NEXT: [[TMP1:%.]] = catchpad within [[TMP0]] [i8 null, i32 64, i8* null]
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[A]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[A]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[B]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[B]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP6]]) [ "funclet"(token [[TMP1]]) ]			; CHECK-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP6]]) [ "funclet"(token [[TMP1]]) ]
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 1
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[C]] to <2 x double>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[C]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: catchret from [[TMP1]] to label [[TRY_CONT:%.*]]			; CHECK-NEXT: catchret from [[TMP1]] to label [[TRY_CONT:%.*]]
	; CHECK: try.cont:			; CHECK: try.cont:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: unreachable:			; CHECK: unreachable:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S \|FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S \|FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=slp-vectorizer -S \|FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=slp-vectorizer -S \|FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	; Test if SLP can handle GEP expressions.			; Test if SLP can handle GEP expressions.
	; The test perform the following action:			; The test perform the following action:
	; x->first = y->first + 16			; x->first = y->first + 16
	; x->second = y->second + 16			; x->second = y->second + 16

	define void @foo1 ({ i32, i32 }* noalias %x, { i32, i32 }* noalias %y) {			define void @foo1 ({ i32, i32 }* noalias %x, { i32, i32 }* noalias %y) {
	; CHECK-LABEL: @foo1(			; CHECK-LABEL: @foo1(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[Y:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[Y:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[X:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[X:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[Y]], i64 0, i32 1			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[Y]], i64 0, i32 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32* [[TMP1]] to <2 x i32>			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[X]], i64 0, i32 1
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32* [[TMP1]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr i32, <2 x i32> [[TMP5]], <2 x i64> <i64 16, i64 16>			; CHECK-NEXT: [[TMP6:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 8
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds { i32, i32* }, { i32, i32 }* [[X]], i64 0, i32 1			; CHECK-NEXT: [[TMP7:%.]] = getelementptr i32, <2 x i32> [[TMP6]], <2 x i64> <i64 16, i64 16>
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32* [[TMP2]] to <2 x i32>			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32* [[TMP2]] to <2 x i32>
	; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x i32> [[TMP7]], <2 x i32>* [[TMP8]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0			%1 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 0
	%2 = load i32, i32* %1, align 8			%2 = load i32, i32* %1, align 8
	%3 = getelementptr inbounds i32, i32* %2, i64 16			%3 = getelementptr inbounds i32, i32* %2, i64 16
	%4 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %x, i64 0, i32 0			%4 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %x, i64 0, i32 0
	store i32* %3, i32** %4, align 8			store i32* %3, i32** %4, align 8
	%5 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 1			%5 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %y, i64 0, i32 1
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	}			}

	define float @bazz() {			define float @bazz() {
	; CHECK-LABEL: @bazz(			; CHECK-LABEL: @bazz(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
				; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
				; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4			; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; CHECK-NEXT: ret float [[OP_EXTRA1]]			; CHECK-NEXT: ret float [[OP_EXTRA1]]
	;			;
	; THRESHOLD-LABEL: @bazz(			; THRESHOLD-LABEL: @bazz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
				; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
				; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])			; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4			; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	▲ Show 20 Lines • Show All 1,175 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]			; SSE-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
	; SSE-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]			; SSE-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
	; SSE-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; SSE-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; SSE-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]			; SSE-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
	; SSE-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]			; SSE-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
	; SSE-NEXT: ret i32 [[TMP23]]			; SSE-NEXT: ret i32 [[TMP23]]
	;			;
	; AVX-LABEL: @maxi8_store_in(			; AVX-LABEL: @maxi8_store_in(
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; AVX-NEXT: store i32 0, i32* @var, align 8			; AVX-NEXT: store i32 0, i32* @var, align 8
				; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
	; AVX-NEXT: ret i32 [[TMP3]]			; AVX-NEXT: ret i32 [[TMP3]]
	;			;
	; AVX2-LABEL: @maxi8_store_in(			; AVX2-LABEL: @maxi8_store_in(
	; AVX2-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; AVX2-NEXT: store i32 0, i32* @var, align 8			; AVX2-NEXT: store i32 0, i32* @var, align 8
				; AVX2-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
	; AVX2-NEXT: ret i32 [[TMP3]]			; AVX2-NEXT: ret i32 [[TMP3]]
	;			;
	; THRESH-LABEL: @maxi8_store_in(			; THRESH-LABEL: @maxi8_store_in(
	; THRESH-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; THRESH-NEXT: store i32 0, i32* @var, align 8			; THRESH-NEXT: store i32 0, i32* @var, align 8
				; THRESH-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])			; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP2]])
	; THRESH-NEXT: ret i32 [[TMP3]]			; THRESH-NEXT: ret i32 [[TMP3]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	%5 = select i1 %4, i32 %2, i32 %3			%5 = select i1 %4, i32 %2, i32 %3
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	▲ Show 20 Lines • Show All 1,390 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

	Show First 20 Lines • Show All 1,002 Lines • ▼ Show 20 Lines
	; STORE-NEXT: [[I_039:%.]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; STORE-NEXT: [[I_039:%.]] = phi i64 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; STORE-NEXT: [[C_ADDR_038:%.]] = phi float [ [[C:%.]], [[FOR_BODY_LR_PH]] ], [ [[INCDEC_PTR:%.]], [[FOR_BODY]] ]			; STORE-NEXT: [[C_ADDR_038:%.]] = phi float [ [[C:%.]], [[FOR_BODY_LR_PH]] ], [ [[INCDEC_PTR:%.]], [[FOR_BODY]] ]
	; STORE-NEXT: [[MUL:%.*]] = shl nsw i64 [[I_039]], 2			; STORE-NEXT: [[MUL:%.*]] = shl nsw i64 [[I_039]], 2
	; STORE-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[MUL]]			; STORE-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[MUL]]
	; STORE-NEXT: [[ADD34:%.*]] = or i64 [[MUL]], 1			; STORE-NEXT: [[ADD34:%.*]] = or i64 [[MUL]], 1
	; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD34]]			; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD34]]
	; STORE-NEXT: [[ADD1135:%.*]] = or i64 [[MUL]], 2			; STORE-NEXT: [[ADD1135:%.*]] = or i64 [[MUL]], 2
	; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]			; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]
	; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*
	; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3			; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3
	; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]			; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]
				; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*
				; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*			; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
	; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]			; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]
	; STORE-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])			; STORE-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])
	; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4			; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4
	; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1			; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1
	; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1			; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1
	; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]			; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
	▲ Show 20 Lines • Show All 817 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-after-bundle.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 1			; SSE-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 1
	; SSE-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 1			; SSE-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 1
	; SSE-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 2			; SSE-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 2
	; SSE-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 2			; SSE-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 2
	; SSE-NEXT: [[ARRAYIDX25:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 2			; SSE-NEXT: [[ARRAYIDX25:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 2
	; SSE-NEXT: [[ARRAYIDX28:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 2			; SSE-NEXT: [[ARRAYIDX28:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 2
	; SSE-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 2			; SSE-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 2
	; SSE-NEXT: [[ARRAYIDX33:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 3			; SSE-NEXT: [[ARRAYIDX33:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 3
				; SSE-NEXT: [[ARRAYIDX35:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 3
				; SSE-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 3
				; SSE-NEXT: [[ARRAYIDX40:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 3
				; SSE-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 3
	; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[C_ADDR_0352]] to <4 x i8>*			; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[C_ADDR_0352]] to <4 x i8>*
	; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1			; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1
	; SSE-NEXT: [[ARRAYIDX35:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 3
	; SSE-NEXT: [[TMP6:%.]] = bitcast i8 [[D_ADDR_0353]] to <4 x i8>*			; SSE-NEXT: [[TMP6:%.]] = bitcast i8 [[D_ADDR_0353]] to <4 x i8>*
	; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> [[TMP6]], align 1			; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> [[TMP6]], align 1
	; SSE-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 3
	; SSE-NEXT: [[TMP8:%.]] = bitcast i8 [[A_ADDR_0355]] to <4 x i8>*			; SSE-NEXT: [[TMP8:%.]] = bitcast i8 [[A_ADDR_0355]] to <4 x i8>*
	; SSE-NEXT: [[TMP9:%.]] = load <4 x i8>, <4 x i8> [[TMP8]], align 1			; SSE-NEXT: [[TMP9:%.]] = load <4 x i8>, <4 x i8> [[TMP8]], align 1
	; SSE-NEXT: [[ARRAYIDX40:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 3
	; SSE-NEXT: [[TMP10:%.]] = bitcast i8 [[B_ADDR_0351]] to <4 x i8>*			; SSE-NEXT: [[TMP10:%.]] = bitcast i8 [[B_ADDR_0351]] to <4 x i8>*
	; SSE-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1			; SSE-NEXT: [[TMP11:%.]] = load <4 x i8>, <4 x i8> [[TMP10]], align 1
	; SSE-NEXT: [[TMP12:%.*]] = icmp ult <4 x i8> [[TMP5]], [[TMP7]]			; SSE-NEXT: [[TMP12:%.*]] = icmp ult <4 x i8> [[TMP5]], [[TMP7]]
	; SSE-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i8> [[TMP11]], <4 x i8> [[TMP9]]			; SSE-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP12]], <4 x i8> [[TMP11]], <4 x i8> [[TMP9]]
	; SSE-NEXT: [[TMP14:%.*]] = zext <4 x i8> [[TMP13]] to <4 x i32>			; SSE-NEXT: [[TMP14:%.*]] = zext <4 x i8> [[TMP13]] to <4 x i32>
	; SSE-NEXT: [[TMP15:%.*]] = mul <4 x i32> [[TMP14]], [[SHUFFLE]]			; SSE-NEXT: [[TMP15:%.*]] = mul <4 x i32> [[TMP14]], [[SHUFFLE]]
	; SSE-NEXT: [[TMP16:%.*]] = trunc <4 x i32> [[TMP15]] to <4 x i8>			; SSE-NEXT: [[TMP16:%.*]] = trunc <4 x i32> [[TMP15]] to <4 x i8>
	; SSE-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 3
	; SSE-NEXT: [[TMP17:%.]] = bitcast i8 [[E_ADDR_0354]] to <4 x i8>*			; SSE-NEXT: [[TMP17:%.]] = bitcast i8 [[E_ADDR_0354]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP16]], <4 x i8>* [[TMP17]], align 1			; SSE-NEXT: store <4 x i8> [[TMP16]], <4 x i8>* [[TMP17]], align 1
	; SSE-NEXT: [[ARRAYIDX45:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 4			; SSE-NEXT: [[ARRAYIDX45:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 4
	; SSE-NEXT: [[ARRAYIDX47:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 4			; SSE-NEXT: [[ARRAYIDX47:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 4
	; SSE-NEXT: [[ARRAYIDX49:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 4			; SSE-NEXT: [[ARRAYIDX49:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 4
	; SSE-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 4			; SSE-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 4
	; SSE-NEXT: [[ARRAYIDX56:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 4			; SSE-NEXT: [[ARRAYIDX56:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 4
	; SSE-NEXT: [[ARRAYIDX57:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 5			; SSE-NEXT: [[ARRAYIDX57:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 5
	; SSE-NEXT: [[ARRAYIDX59:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 5			; SSE-NEXT: [[ARRAYIDX59:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 5
	; SSE-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 5			; SSE-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 5
	; SSE-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 5			; SSE-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 5
	; SSE-NEXT: [[ARRAYIDX68:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 5			; SSE-NEXT: [[ARRAYIDX68:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 5
	; SSE-NEXT: [[ARRAYIDX69:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 6			; SSE-NEXT: [[ARRAYIDX69:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 6
	; SSE-NEXT: [[ARRAYIDX71:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 6			; SSE-NEXT: [[ARRAYIDX71:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 6
	; SSE-NEXT: [[ARRAYIDX73:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 6			; SSE-NEXT: [[ARRAYIDX73:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 6
	; SSE-NEXT: [[ARRAYIDX76:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 6			; SSE-NEXT: [[ARRAYIDX76:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 6
	; SSE-NEXT: [[ARRAYIDX80:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 6			; SSE-NEXT: [[ARRAYIDX80:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 6
	; SSE-NEXT: [[ARRAYIDX81:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 7			; SSE-NEXT: [[ARRAYIDX81:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 7
				; SSE-NEXT: [[ARRAYIDX83:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 7
				; SSE-NEXT: [[ARRAYIDX85:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 7
				; SSE-NEXT: [[ARRAYIDX88:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 7
				; SSE-NEXT: [[ARRAYIDX92:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 7
	; SSE-NEXT: [[TMP18:%.]] = bitcast i8 [[ARRAYIDX45]] to <4 x i8>*			; SSE-NEXT: [[TMP18:%.]] = bitcast i8 [[ARRAYIDX45]] to <4 x i8>*
	; SSE-NEXT: [[TMP19:%.]] = load <4 x i8>, <4 x i8> [[TMP18]], align 1			; SSE-NEXT: [[TMP19:%.]] = load <4 x i8>, <4 x i8> [[TMP18]], align 1
	; SSE-NEXT: [[ARRAYIDX83:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 7
	; SSE-NEXT: [[TMP20:%.]] = bitcast i8 [[ARRAYIDX47]] to <4 x i8>*			; SSE-NEXT: [[TMP20:%.]] = bitcast i8 [[ARRAYIDX47]] to <4 x i8>*
	; SSE-NEXT: [[TMP21:%.]] = load <4 x i8>, <4 x i8> [[TMP20]], align 1			; SSE-NEXT: [[TMP21:%.]] = load <4 x i8>, <4 x i8> [[TMP20]], align 1
	; SSE-NEXT: [[ARRAYIDX85:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 7
	; SSE-NEXT: [[TMP22:%.]] = bitcast i8 [[ARRAYIDX49]] to <4 x i8>*			; SSE-NEXT: [[TMP22:%.]] = bitcast i8 [[ARRAYIDX49]] to <4 x i8>*
	; SSE-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1			; SSE-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1
	; SSE-NEXT: [[ARRAYIDX88:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 7
	; SSE-NEXT: [[TMP24:%.]] = bitcast i8 [[ARRAYIDX52]] to <4 x i8>*			; SSE-NEXT: [[TMP24:%.]] = bitcast i8 [[ARRAYIDX52]] to <4 x i8>*
	; SSE-NEXT: [[TMP25:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1			; SSE-NEXT: [[TMP25:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1
	; SSE-NEXT: [[TMP26:%.*]] = icmp ult <4 x i8> [[TMP19]], [[TMP21]]			; SSE-NEXT: [[TMP26:%.*]] = icmp ult <4 x i8> [[TMP19]], [[TMP21]]
	; SSE-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> [[TMP25]], <4 x i8> [[TMP23]]			; SSE-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> [[TMP25]], <4 x i8> [[TMP23]]
	; SSE-NEXT: [[TMP28:%.*]] = zext <4 x i8> [[TMP27]] to <4 x i32>			; SSE-NEXT: [[TMP28:%.*]] = zext <4 x i8> [[TMP27]] to <4 x i32>
	; SSE-NEXT: [[TMP29:%.*]] = mul <4 x i32> [[TMP28]], [[SHUFFLE1]]			; SSE-NEXT: [[TMP29:%.*]] = mul <4 x i32> [[TMP28]], [[SHUFFLE1]]
	; SSE-NEXT: [[TMP30:%.*]] = trunc <4 x i32> [[TMP29]] to <4 x i8>			; SSE-NEXT: [[TMP30:%.*]] = trunc <4 x i32> [[TMP29]] to <4 x i8>
	; SSE-NEXT: [[ARRAYIDX92:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 7
	; SSE-NEXT: [[TMP31:%.]] = bitcast i8 [[ARRAYIDX56]] to <4 x i8>*			; SSE-NEXT: [[TMP31:%.]] = bitcast i8 [[ARRAYIDX56]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP30]], <4 x i8>* [[TMP31]], align 1			; SSE-NEXT: store <4 x i8> [[TMP30]], <4 x i8>* [[TMP31]], align 1
	; SSE-NEXT: [[ARRAYIDX93:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 8			; SSE-NEXT: [[ARRAYIDX93:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 8
	; SSE-NEXT: [[ARRAYIDX95:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 8			; SSE-NEXT: [[ARRAYIDX95:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 8
	; SSE-NEXT: [[ARRAYIDX97:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 8			; SSE-NEXT: [[ARRAYIDX97:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 8
	; SSE-NEXT: [[ARRAYIDX100:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 8			; SSE-NEXT: [[ARRAYIDX100:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 8
	; SSE-NEXT: [[ARRAYIDX104:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 8			; SSE-NEXT: [[ARRAYIDX104:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 8
	; SSE-NEXT: [[ARRAYIDX105:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 9			; SSE-NEXT: [[ARRAYIDX105:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 9
	; SSE-NEXT: [[ARRAYIDX107:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 9			; SSE-NEXT: [[ARRAYIDX107:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 9
	; SSE-NEXT: [[ARRAYIDX109:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 9			; SSE-NEXT: [[ARRAYIDX109:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 9
	; SSE-NEXT: [[ARRAYIDX112:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 9			; SSE-NEXT: [[ARRAYIDX112:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 9
	; SSE-NEXT: [[ARRAYIDX116:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 9			; SSE-NEXT: [[ARRAYIDX116:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 9
	; SSE-NEXT: [[ARRAYIDX117:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 10			; SSE-NEXT: [[ARRAYIDX117:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 10
	; SSE-NEXT: [[ARRAYIDX119:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 10			; SSE-NEXT: [[ARRAYIDX119:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 10
	; SSE-NEXT: [[ARRAYIDX121:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 10			; SSE-NEXT: [[ARRAYIDX121:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 10
	; SSE-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 10			; SSE-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 10
	; SSE-NEXT: [[ARRAYIDX128:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 10			; SSE-NEXT: [[ARRAYIDX128:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 10
	; SSE-NEXT: [[ARRAYIDX129:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 11			; SSE-NEXT: [[ARRAYIDX129:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 11
				; SSE-NEXT: [[ARRAYIDX131:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 11
				; SSE-NEXT: [[ARRAYIDX133:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 11
				; SSE-NEXT: [[ARRAYIDX136:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 11
				; SSE-NEXT: [[ARRAYIDX140:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 11
	; SSE-NEXT: [[TMP32:%.]] = bitcast i8 [[ARRAYIDX93]] to <4 x i8>*			; SSE-NEXT: [[TMP32:%.]] = bitcast i8 [[ARRAYIDX93]] to <4 x i8>*
	; SSE-NEXT: [[TMP33:%.]] = load <4 x i8>, <4 x i8> [[TMP32]], align 1			; SSE-NEXT: [[TMP33:%.]] = load <4 x i8>, <4 x i8> [[TMP32]], align 1
	; SSE-NEXT: [[ARRAYIDX131:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 11
	; SSE-NEXT: [[TMP34:%.]] = bitcast i8 [[ARRAYIDX95]] to <4 x i8>*			; SSE-NEXT: [[TMP34:%.]] = bitcast i8 [[ARRAYIDX95]] to <4 x i8>*
	; SSE-NEXT: [[TMP35:%.]] = load <4 x i8>, <4 x i8> [[TMP34]], align 1			; SSE-NEXT: [[TMP35:%.]] = load <4 x i8>, <4 x i8> [[TMP34]], align 1
	; SSE-NEXT: [[ARRAYIDX133:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 11
	; SSE-NEXT: [[TMP36:%.]] = bitcast i8 [[ARRAYIDX97]] to <4 x i8>*			; SSE-NEXT: [[TMP36:%.]] = bitcast i8 [[ARRAYIDX97]] to <4 x i8>*
	; SSE-NEXT: [[TMP37:%.]] = load <4 x i8>, <4 x i8> [[TMP36]], align 1			; SSE-NEXT: [[TMP37:%.]] = load <4 x i8>, <4 x i8> [[TMP36]], align 1
	; SSE-NEXT: [[ARRAYIDX136:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 11
	; SSE-NEXT: [[TMP38:%.]] = bitcast i8 [[ARRAYIDX100]] to <4 x i8>*			; SSE-NEXT: [[TMP38:%.]] = bitcast i8 [[ARRAYIDX100]] to <4 x i8>*
	; SSE-NEXT: [[TMP39:%.]] = load <4 x i8>, <4 x i8> [[TMP38]], align 1			; SSE-NEXT: [[TMP39:%.]] = load <4 x i8>, <4 x i8> [[TMP38]], align 1
	; SSE-NEXT: [[TMP40:%.*]] = icmp ult <4 x i8> [[TMP33]], [[TMP35]]			; SSE-NEXT: [[TMP40:%.*]] = icmp ult <4 x i8> [[TMP33]], [[TMP35]]
	; SSE-NEXT: [[TMP41:%.*]] = select <4 x i1> [[TMP40]], <4 x i8> [[TMP39]], <4 x i8> [[TMP37]]			; SSE-NEXT: [[TMP41:%.*]] = select <4 x i1> [[TMP40]], <4 x i8> [[TMP39]], <4 x i8> [[TMP37]]
	; SSE-NEXT: [[TMP42:%.*]] = zext <4 x i8> [[TMP41]] to <4 x i32>			; SSE-NEXT: [[TMP42:%.*]] = zext <4 x i8> [[TMP41]] to <4 x i32>
	; SSE-NEXT: [[TMP43:%.*]] = mul <4 x i32> [[TMP42]], [[SHUFFLE2]]			; SSE-NEXT: [[TMP43:%.*]] = mul <4 x i32> [[TMP42]], [[SHUFFLE2]]
	; SSE-NEXT: [[TMP44:%.*]] = trunc <4 x i32> [[TMP43]] to <4 x i8>			; SSE-NEXT: [[TMP44:%.*]] = trunc <4 x i32> [[TMP43]] to <4 x i8>
	; SSE-NEXT: [[ARRAYIDX140:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 11
	; SSE-NEXT: [[TMP45:%.]] = bitcast i8 [[ARRAYIDX104]] to <4 x i8>*			; SSE-NEXT: [[TMP45:%.]] = bitcast i8 [[ARRAYIDX104]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP44]], <4 x i8>* [[TMP45]], align 1			; SSE-NEXT: store <4 x i8> [[TMP44]], <4 x i8>* [[TMP45]], align 1
	; SSE-NEXT: [[ARRAYIDX141:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 12			; SSE-NEXT: [[ARRAYIDX141:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 12
	; SSE-NEXT: [[ARRAYIDX143:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 12			; SSE-NEXT: [[ARRAYIDX143:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 12
	; SSE-NEXT: [[ARRAYIDX145:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 12			; SSE-NEXT: [[ARRAYIDX145:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 12
	; SSE-NEXT: [[ARRAYIDX148:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 12			; SSE-NEXT: [[ARRAYIDX148:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 12
	; SSE-NEXT: [[ARRAYIDX152:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 12			; SSE-NEXT: [[ARRAYIDX152:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 12
	; SSE-NEXT: [[ARRAYIDX153:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 13			; SSE-NEXT: [[ARRAYIDX153:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 13
	; SSE-NEXT: [[ARRAYIDX155:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 13			; SSE-NEXT: [[ARRAYIDX155:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 13
	; SSE-NEXT: [[ARRAYIDX157:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 13			; SSE-NEXT: [[ARRAYIDX157:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 13
	; SSE-NEXT: [[ARRAYIDX160:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 13			; SSE-NEXT: [[ARRAYIDX160:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 13
	; SSE-NEXT: [[ARRAYIDX164:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 13			; SSE-NEXT: [[ARRAYIDX164:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 13
	; SSE-NEXT: [[ARRAYIDX165:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 14			; SSE-NEXT: [[ARRAYIDX165:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 14
	; SSE-NEXT: [[ARRAYIDX167:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 14			; SSE-NEXT: [[ARRAYIDX167:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 14
	; SSE-NEXT: [[ARRAYIDX169:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 14			; SSE-NEXT: [[ARRAYIDX169:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 14
	; SSE-NEXT: [[ARRAYIDX172:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 14			; SSE-NEXT: [[ARRAYIDX172:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 14
	; SSE-NEXT: [[ARRAYIDX176:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 14			; SSE-NEXT: [[ARRAYIDX176:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 14
	; SSE-NEXT: [[ARRAYIDX177:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 15			; SSE-NEXT: [[ARRAYIDX177:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 15
				; SSE-NEXT: [[ARRAYIDX179:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 15
				; SSE-NEXT: [[ARRAYIDX181:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 15
				; SSE-NEXT: [[ARRAYIDX184:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 15
				; SSE-NEXT: [[ARRAYIDX188:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 15
	; SSE-NEXT: [[TMP46:%.]] = bitcast i8 [[ARRAYIDX141]] to <4 x i8>*			; SSE-NEXT: [[TMP46:%.]] = bitcast i8 [[ARRAYIDX141]] to <4 x i8>*
	; SSE-NEXT: [[TMP47:%.]] = load <4 x i8>, <4 x i8> [[TMP46]], align 1			; SSE-NEXT: [[TMP47:%.]] = load <4 x i8>, <4 x i8> [[TMP46]], align 1
	; SSE-NEXT: [[ARRAYIDX179:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 15
	; SSE-NEXT: [[TMP48:%.]] = bitcast i8 [[ARRAYIDX143]] to <4 x i8>*			; SSE-NEXT: [[TMP48:%.]] = bitcast i8 [[ARRAYIDX143]] to <4 x i8>*
	; SSE-NEXT: [[TMP49:%.]] = load <4 x i8>, <4 x i8> [[TMP48]], align 1			; SSE-NEXT: [[TMP49:%.]] = load <4 x i8>, <4 x i8> [[TMP48]], align 1
	; SSE-NEXT: [[ARRAYIDX181:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 15
	; SSE-NEXT: [[TMP50:%.]] = bitcast i8 [[ARRAYIDX145]] to <4 x i8>*			; SSE-NEXT: [[TMP50:%.]] = bitcast i8 [[ARRAYIDX145]] to <4 x i8>*
	; SSE-NEXT: [[TMP51:%.]] = load <4 x i8>, <4 x i8> [[TMP50]], align 1			; SSE-NEXT: [[TMP51:%.]] = load <4 x i8>, <4 x i8> [[TMP50]], align 1
	; SSE-NEXT: [[ARRAYIDX184:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 15
	; SSE-NEXT: [[TMP52:%.]] = bitcast i8 [[ARRAYIDX148]] to <4 x i8>*			; SSE-NEXT: [[TMP52:%.]] = bitcast i8 [[ARRAYIDX148]] to <4 x i8>*
	; SSE-NEXT: [[TMP53:%.]] = load <4 x i8>, <4 x i8> [[TMP52]], align 1			; SSE-NEXT: [[TMP53:%.]] = load <4 x i8>, <4 x i8> [[TMP52]], align 1
	; SSE-NEXT: [[TMP54:%.*]] = icmp ult <4 x i8> [[TMP47]], [[TMP49]]			; SSE-NEXT: [[TMP54:%.*]] = icmp ult <4 x i8> [[TMP47]], [[TMP49]]
	; SSE-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP54]], <4 x i8> [[TMP53]], <4 x i8> [[TMP51]]			; SSE-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP54]], <4 x i8> [[TMP53]], <4 x i8> [[TMP51]]
	; SSE-NEXT: [[TMP56:%.*]] = zext <4 x i8> [[TMP55]] to <4 x i32>			; SSE-NEXT: [[TMP56:%.*]] = zext <4 x i8> [[TMP55]] to <4 x i32>
	; SSE-NEXT: [[TMP57:%.*]] = mul <4 x i32> [[TMP56]], [[SHUFFLE3]]			; SSE-NEXT: [[TMP57:%.*]] = mul <4 x i32> [[TMP56]], [[SHUFFLE3]]
	; SSE-NEXT: [[TMP58:%.*]] = trunc <4 x i32> [[TMP57]] to <4 x i8>			; SSE-NEXT: [[TMP58:%.*]] = trunc <4 x i32> [[TMP57]] to <4 x i8>
	; SSE-NEXT: [[ARRAYIDX188:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 15
	; SSE-NEXT: [[TMP59:%.]] = bitcast i8 [[ARRAYIDX152]] to <4 x i8>*			; SSE-NEXT: [[TMP59:%.]] = bitcast i8 [[ARRAYIDX152]] to <4 x i8>*
	; SSE-NEXT: store <4 x i8> [[TMP58]], <4 x i8>* [[TMP59]], align 1			; SSE-NEXT: store <4 x i8> [[TMP58]], <4 x i8>* [[TMP59]], align 1
	; SSE-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1			; SSE-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1
	; SSE-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16			; SSE-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16
	; SSE-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16			; SSE-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16
	; SSE-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16			; SSE-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16
	; SSE-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16			; SSE-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16
	; SSE-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16			; SSE-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[ARRAYIDX160:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 13			; AVX512-NEXT: [[ARRAYIDX160:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 13
	; AVX512-NEXT: [[ARRAYIDX164:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 13			; AVX512-NEXT: [[ARRAYIDX164:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 13
	; AVX512-NEXT: [[ARRAYIDX165:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 14			; AVX512-NEXT: [[ARRAYIDX165:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 14
	; AVX512-NEXT: [[ARRAYIDX167:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 14			; AVX512-NEXT: [[ARRAYIDX167:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 14
	; AVX512-NEXT: [[ARRAYIDX169:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 14			; AVX512-NEXT: [[ARRAYIDX169:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 14
	; AVX512-NEXT: [[ARRAYIDX172:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 14			; AVX512-NEXT: [[ARRAYIDX172:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 14
	; AVX512-NEXT: [[ARRAYIDX176:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 14			; AVX512-NEXT: [[ARRAYIDX176:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 14
	; AVX512-NEXT: [[ARRAYIDX177:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 15			; AVX512-NEXT: [[ARRAYIDX177:%.]] = getelementptr inbounds i8, i8 [[C_ADDR_0352]], i64 15
				; AVX512-NEXT: [[ARRAYIDX179:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 15
				; AVX512-NEXT: [[ARRAYIDX181:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 15
				; AVX512-NEXT: [[ARRAYIDX184:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 15
				; AVX512-NEXT: [[ARRAYIDX188:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 15
	; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[C_ADDR_0352]] to <16 x i8>*			; AVX512-NEXT: [[TMP1:%.]] = bitcast i8 [[C_ADDR_0352]] to <16 x i8>*
	; AVX512-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1			; AVX512-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> [[TMP1]], align 1
	; AVX512-NEXT: [[ARRAYIDX179:%.]] = getelementptr inbounds i8, i8 [[D_ADDR_0353]], i64 15
	; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[D_ADDR_0353]] to <16 x i8>*			; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[D_ADDR_0353]] to <16 x i8>*
	; AVX512-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1			; AVX512-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
	; AVX512-NEXT: [[ARRAYIDX181:%.]] = getelementptr inbounds i8, i8 [[A_ADDR_0355]], i64 15
	; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[A_ADDR_0355]] to <16 x i8>*			; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[A_ADDR_0355]] to <16 x i8>*
	; AVX512-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> [[TMP5]], align 1			; AVX512-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> [[TMP5]], align 1
	; AVX512-NEXT: [[ARRAYIDX184:%.]] = getelementptr inbounds i8, i8 [[B_ADDR_0351]], i64 15
	; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[B_ADDR_0351]] to <16 x i8>*			; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[B_ADDR_0351]] to <16 x i8>*
	; AVX512-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1			; AVX512-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1
	; AVX512-NEXT: [[TMP9:%.*]] = icmp ult <16 x i8> [[TMP2]], [[TMP4]]			; AVX512-NEXT: [[TMP9:%.*]] = icmp ult <16 x i8> [[TMP2]], [[TMP4]]
	; AVX512-NEXT: [[TMP10:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP8]], <16 x i8> [[TMP6]]			; AVX512-NEXT: [[TMP10:%.*]] = select <16 x i1> [[TMP9]], <16 x i8> [[TMP8]], <16 x i8> [[TMP6]]
	; AVX512-NEXT: [[TMP11:%.*]] = zext <16 x i8> [[TMP10]] to <16 x i32>			; AVX512-NEXT: [[TMP11:%.*]] = zext <16 x i8> [[TMP10]] to <16 x i32>
	; AVX512-NEXT: [[TMP12:%.*]] = mul <16 x i32> [[TMP11]], [[SHUFFLE]]			; AVX512-NEXT: [[TMP12:%.*]] = mul <16 x i32> [[TMP11]], [[SHUFFLE]]
	; AVX512-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>			; AVX512-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>
	; AVX512-NEXT: [[ARRAYIDX188:%.]] = getelementptr inbounds i8, i8 [[E_ADDR_0354]], i64 15
	; AVX512-NEXT: [[TMP14:%.]] = bitcast i8 [[E_ADDR_0354]] to <16 x i8>*			; AVX512-NEXT: [[TMP14:%.]] = bitcast i8 [[E_ADDR_0354]] to <16 x i8>*
	; AVX512-NEXT: store <16 x i8> [[TMP13]], <16 x i8>* [[TMP14]], align 1			; AVX512-NEXT: store <16 x i8> [[TMP13]], <16 x i8>* [[TMP14]], align 1
	; AVX512-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1			; AVX512-NEXT: [[INC]] = add nuw nsw i32 [[I_0356]], 1
	; AVX512-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16			; AVX512-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[A_ADDR_0355]], i64 16
	; AVX512-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16			; AVX512-NEXT: [[ADD_PTR189]] = getelementptr inbounds i8, i8* [[B_ADDR_0351]], i64 16
	; AVX512-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16			; AVX512-NEXT: [[ADD_PTR190]] = getelementptr inbounds i8, i8* [[C_ADDR_0352]], i64 16
	; AVX512-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16			; AVX512-NEXT: [[ADD_PTR191]] = getelementptr inbounds i8, i8* [[D_ADDR_0353]], i64 16
	; AVX512-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16			; AVX512-NEXT: [[ADD_PTR192]] = getelementptr inbounds i8, i8* [[E_ADDR_0354]], i64 16
	▲ Show 20 Lines • Show All 590 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0			; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0
	; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1			; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1
	; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2			; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2
	; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3			; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[C3]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[RD1]]			; CHECK-NEXT: ret <4 x float> [[RD1]]
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]			; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0			; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1			; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0			; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1			; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0			; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1			; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]			; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> poison, float %c0, i32 0			%v0 = insertelement <4 x float> poison, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
	; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0			; CHECK-NEXT: [[B0:%.]] = extractelement <4 x float> [[B:%.]], i32 0
	; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1			; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> [[B]], i32 1
	; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2			; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> [[B]], i32 2
	; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3			; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> [[B]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[C0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[A0]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[B0]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[C2]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[C3]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[RD1]]			; CHECK-NEXT: ret <4 x float> [[RD1]]
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1			; MINTREESIZE-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[B]], i32 1
	; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0			; MINTREESIZE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[B]], i32 0
	; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3			; MINTREESIZE-NEXT: [[TMP5:%.]] = extractelement <4 x float> [[A:%.]], i32 3
	; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2			; MINTREESIZE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[A]], i32 2
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1			; MINTREESIZE-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[A]], i32 1
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0			; MINTREESIZE-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[A]], i32 0
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[TMP8]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP4]], i32 1
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[A]], [[B]]			; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0			; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP3]], i32 1
	; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[TMP3]], i32 1			; MINTREESIZE-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0
	; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP6]], i32 0			; MINTREESIZE-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP2]], i32 1
	; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP2]], i32 1			; MINTREESIZE-NEXT: [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0			; MINTREESIZE-NEXT: [[TMP16:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP1]], i32 1
	; MINTREESIZE-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP16]], float [[TMP1]], i32 1			; MINTREESIZE-NEXT: [[TMP17:%.*]] = fadd <4 x float> [[A]], [[B]]
	; MINTREESIZE-NEXT: ret <4 x float> [[TMP11]]			; MINTREESIZE-NEXT: ret <4 x float> [[TMP17]]
	;			;
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%b0 = extractelement <4 x float> %b, i32 0			%b0 = extractelement <4 x float> %b, i32 0
	%c0 = fadd float %a0, %b0			%c0 = fadd float %a0, %b0
	%v0 = insertelement <4 x float> undef, float %c0, i32 0			%v0 = insertelement <4 x float> undef, float %c0, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	%b1 = extractelement <4 x float> %b, i32 1			%b1 = extractelement <4 x float> %b, i32 1
	%c1 = fadd float %a1, %b1			%c1 = fadd float %a1, %b1
	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1			; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 16			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16
	; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], poison			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], poison
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], poison			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], poison
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], poison			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], poison
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0			; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1
	Show All 40 Lines

llvm/test/Transforms/SLPVectorizer/X86/insertvalue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s

	define void @julia_2xdouble([2 x double]* sret([2 x double]), [2 x double], [2 x double], [2 x double]*) {			define void @julia_2xdouble([2 x double]* sret([2 x double]), [2 x double], [2 x double], [2 x double]*) {
	; CHECK-LABEL: @julia_2xdouble(			; CHECK-LABEL: @julia_2xdouble(
	; CHECK-NEXT: top:			; CHECK-NEXT: top:
	; CHECK-NEXT: [[PX0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP2:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PX0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP2:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PY0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP3:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PY0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP3:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PX1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP2]], i64 0, i64 1			; CHECK-NEXT: [[PX1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP2]], i64 0, i64 1
				; CHECK-NEXT: [[PY1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP3]], i64 0, i64 1
				; CHECK-NEXT: [[PZ0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP1:%.*]], i64 0, i64 0
				; CHECK-NEXT: [[PZ1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP1]], i64 0, i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[PX0]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[PX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 4
	; CHECK-NEXT: [[PY1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP3]], i64 0, i64 1
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[PY0]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[PY0]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[PZ0:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP1:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PZ1:%.]] = getelementptr inbounds [2 x double], [2 x double] [[TMP1]], i64 0, i64 1
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[PZ0]] to <2 x double>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[PZ0]] to <2 x double>*
	; CHECK-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> [[TMP9]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load <2 x double>, <2 x double> [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP11]], i32 0
	; CHECK-NEXT: [[I0:%.*]] = insertvalue [2 x double] undef, double [[TMP12]], 0			; CHECK-NEXT: [[I0:%.*]] = insertvalue [2 x double] undef, double [[TMP12]], 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x double> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x double> [[TMP11]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertvalue [2 x double] [[I0]], double [[TMP13]], 1			; CHECK-NEXT: [[I1:%.*]] = insertvalue [2 x double] [[I0]], double [[TMP13]], 1
	; CHECK-NEXT: store [2 x double] [[I1]], [2 x double]* [[TMP0:%.*]], align 4			; CHECK-NEXT: store [2 x double] [[I1]], [2 x double]* [[TMP0:%.*]], align 4
	Show All 27 Lines
	; CHECK-NEXT: top:			; CHECK-NEXT: top:
	; CHECK-NEXT: [[PX0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PX0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PY0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PY0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PX1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 1			; CHECK-NEXT: [[PX1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 1
	; CHECK-NEXT: [[PY1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 1			; CHECK-NEXT: [[PY1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 1
	; CHECK-NEXT: [[PX2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 2			; CHECK-NEXT: [[PX2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 2
	; CHECK-NEXT: [[PY2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 2			; CHECK-NEXT: [[PY2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 2
	; CHECK-NEXT: [[PX3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 3			; CHECK-NEXT: [[PX3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP2]], i64 0, i64 3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[PX0]] to <4 x float>*
	; CHECK-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
	; CHECK-NEXT: [[PY3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 3			; CHECK-NEXT: [[PY3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP3]], i64 0, i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[PY0]] to <4 x float>*
	; CHECK-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[PZ0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1:%.*]], i64 0, i64 0			; CHECK-NEXT: [[PZ0:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[PZ1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 1			; CHECK-NEXT: [[PZ1:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 1
	; CHECK-NEXT: [[PZ2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 2			; CHECK-NEXT: [[PZ2:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 2
	; CHECK-NEXT: [[PZ3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 3			; CHECK-NEXT: [[PZ3:%.]] = getelementptr inbounds [4 x float], [4 x float] [[TMP1]], i64 0, i64 3
				; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[PX0]] to <4 x float>*
				; CHECK-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
				; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[PY0]] to <4 x float>*
				; CHECK-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4
				; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x float> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[PZ0]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[PZ0]] to <4 x float>*
	; CHECK-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[TMP8]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP11]], i32 0
	; CHECK-NEXT: [[I0:%.*]] = insertvalue [4 x float] undef, float [[TMP12]], 0			; CHECK-NEXT: [[I0:%.*]] = insertvalue [4 x float] undef, float [[TMP12]], 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 1
	; CHECK-NEXT: [[I1:%.*]] = insertvalue [4 x float] [[I0]], float [[TMP13]], 1			; CHECK-NEXT: [[I1:%.*]] = insertvalue [4 x float] [[I0]], float [[TMP13]], 1
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP11]], i32 2			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP11]], i32 2
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/inst_size_bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-max-reg-size=128 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-max-reg-size=128 \| FileCheck %s

	define void @inst_size(i64* %a, <2 x i64> %b) {			define void @inst_size(i64* %a, <2 x i64> %b) {
	; CHECK-LABEL: @inst_size(			; CHECK-LABEL: @inst_size(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[VAL:%.]] = extractelement <2 x i64> [[B:%.]], i32 0			; CHECK-NEXT: [[VAL:%.]] = extractelement <2 x i64> [[B:%.]], i32 0
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds i64, i64 [[A:%.*]], i64 1			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds i64, i64 [[A:%.*]], i64 1
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds i64, i64 [[A]], i64 2			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds i64, i64 [[A]], i64 2
	; CHECK-NEXT: [[PTR4:%.]] = getelementptr inbounds i64, i64 [[A]], i64 3			; CHECK-NEXT: [[PTR4:%.]] = getelementptr inbounds i64, i64 [[A]], i64 3
				; CHECK-NEXT: [[T41:%.*]] = icmp sgt i64 0, [[VAL]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[A]] to <4 x i64>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[A]] to <4 x i64>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 4
	; CHECK-NEXT: [[T41:%.*]] = icmp sgt i64 0, [[VAL]]
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i64> zeroinitializer, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i64> zeroinitializer, [[TMP1]]
	; CHECK-NEXT: br label [[BLOCK:%.*]]			; CHECK-NEXT: br label [[BLOCK:%.*]]
	; CHECK: block:			; CHECK: block:
	; CHECK-NEXT: [[PHI1:%.]] = phi i1 [ [[T41]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[PHI1:%.]] = phi i1 [ [[T41]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi <4 x i1> [ [[TMP2]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <4 x i1> [ [[TMP2]], [[ENTRY]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/intrinsic_with_scalar_param.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -slp-threshold=-8 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -slp-threshold=-8 \| FileCheck %s

	declare float @llvm.powi.f32.i32(float, i32)			declare float @llvm.powi.f32.i32(float, i32)
	define void @vec_powi_f32(float* %a, float* %c, i32 %P) {			define void @vec_powi_f32(float* %a, float* %c, i32 %P) {
	; CHECK-LABEL: @vec_powi_f32(			; CHECK-LABEL: @vec_powi_f32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A:%.*]], i32 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A:%.*]], i32 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i32 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i32 2
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i32 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i32 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A]] to <4 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> [[TMP1]], i32 [[P:%.]])
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[C:%.*]], i32 1			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[C:%.*]], i32 1
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[C]], i32 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[C]], i32 2
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[C]], i32 3			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds float, float [[C]], i32 3
				; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[A]] to <4 x float>*
				; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> [[TMP1]], i32 [[P:%.]])
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[C]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[C]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4			; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%i0 = load float, float* %a, align 4			%i0 = load float, float* %a, align 4
	%call1 = tail call float @llvm.powi.f32.i32(float %i0,i32 %P)			%call1 = tail call float @llvm.powi.f32.i32(float %i0,i32 %P)

	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-shuffle-placement.ll

	Show All 16 Lines
	; CHECK-LABEL: @jumble1(			; CHECK-LABEL: @jumble1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 10			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 10
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 11			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 11
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 1
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 12			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 12
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13
				; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
				; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP1]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 10			%arrayidx = getelementptr inbounds i32, i32* %A, i64 10
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%1 = load i32, i32* %A, align 4			%1 = load i32, i32* %A, align 4
	Show All 30 Lines
	; CHECK-LABEL: @jumble2(			; CHECK-LABEL: @jumble2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 10			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 10
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 11			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 11
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 1
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 12			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 12
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13
				; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
				; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
				; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], [[TMP1]]
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 10			%arrayidx = getelementptr inbounds i32, i32* %A, i64 10
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%1 = load i32, i32* %A, align 4			%1 = load i32, i32* %A, align 4
	Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s			; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s



	define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {			define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load(			; CHECK-LABEL: @jumbled-load(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0			; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2
	; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
				; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*
				; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> <i32 2, i32 0, i32 3, i32 1>
				; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[SHUFFLE]]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3
	Show All 28 Lines


	define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {			define i32 @jumbled-load-multiuses(i32* noalias nocapture %in, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load-multiuses(			; CHECK-LABEL: @jumbled-load-multiuses(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
				; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
				; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
				; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
				; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP7]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP2]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP2]], [[TMP10]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 2, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 3
	Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled_store_crash.ll

	Show All 11 Lines

	define dso_local void @j() local_unnamed_addr {			define dso_local void @j() local_unnamed_addr {
	; CHECK-LABEL: @j(			; CHECK-LABEL: @j(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32** @b, align 8			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32** @b, align 8
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 12			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 12
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[ARRAYIDX]] to <2 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 13			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 13
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[ARRAYIDX1]] to <2 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x float>
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP6]], <float 1.000000e+01, float 1.000000e+01>
	; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> <float 1.000000e+00, float 0.000000e+00>, [[TMP7]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
	; CHECK-NEXT: store float [[TMP9]], float* @g, align 4
	; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[SHUFFLE]], <float -1.000000e+00, float -1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP10]], i32 2
	; CHECK-NEXT: store float [[TMP11]], float* @c, align 4
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP10]], i32 0
	; CHECK-NEXT: store float [[TMP12]], float* @d, align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP10]], i32 3
	; CHECK-NEXT: store float [[TMP13]], float* @e, align 4
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP10]], i32 1
	; CHECK-NEXT: store float [[TMP14]], float* @f, align 4
	; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14			; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 14
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 15
	; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP15]] to float			; CHECK-NEXT: [[CONV19:%.*]] = sitofp i32 [[TMP1]] to float
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[ARRAYIDX]] to <2 x i32>*
				; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
				; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[ARRAYIDX1]] to <2 x i32>*
				; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
				; CHECK-NEXT: [[TMP6:%.*]] = add nsw <2 x i32> [[TMP5]], [[TMP3]]
				; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP6]] to <2 x float>
				; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x float> [[TMP7]], <float 1.000000e+01, float 1.000000e+01>
				; CHECK-NEXT: [[TMP9:%.*]] = fsub <2 x float> <float 1.000000e+00, float 0.000000e+00>, [[TMP8]]
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
				; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
				; CHECK-NEXT: store float [[TMP10]], float* @g, align 4
				; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> [[SHUFFLE]], <float -1.000000e+00, float -1.000000e+00, float 1.000000e+00, float 1.000000e+00>
				; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP11]], i32 2
				; CHECK-NEXT: store float [[TMP12]], float* @c, align 4
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 0
				; CHECK-NEXT: store float [[TMP13]], float* @d, align 4
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP11]], i32 3
				; CHECK-NEXT: store float [[TMP14]], float* @e, align 4
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP11]], i32 1
				; CHECK-NEXT: store float [[TMP15]], float* @f, align 4
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> <float poison, float -1.000000e+00, float poison, float -1.000000e+00>, float [[CONV19]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP17]], i32 2			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP17]], i32 2
	; CHECK-NEXT: [[TMP19:%.*]] = fsub <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = fsub <4 x float> [[TMP11]], [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = fadd <4 x float> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP20:%.*]] = fadd <4 x float> [[TMP11]], [[TMP18]]
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP22:%.*]] = fptosi <4 x float> [[TMP21]] to <4 x i32>			; CHECK-NEXT: [[TMP22:%.*]] = fptosi <4 x float> [[TMP21]] to <4 x i32>
	; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*			; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[ARRAYIDX1]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP23]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP22]], <4 x i32>* [[TMP23]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* @b, align 8			%0 = load i32, i32* @b, align 8
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

	Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 3
	; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, i64 [[Q:%.*]], i64 0			; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, i64 [[Q:%.*]], i64 0
	; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 1			; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 1
	; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 2			; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 2
	; CHECK-NEXT: [[Q3:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 3			; CHECK-NEXT: [[Q3:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[P0]] to <2 x i64>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[P0]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[P2]] to <2 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[Q0]] to <2 x i64>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> [[TMP3]], align 2			; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> [[TMP3]], align 2
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[Q0]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 2			; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[P2]] to <2 x i64>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[Q2]] to <2 x i64>*			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 2
	; CHECK-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> [[TMP7]], align 2			; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[Q2]] to <2 x i64>*
	; CHECK-NEXT: [[TMP9:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP6]]			; CHECK-NEXT: [[TMP9:%.]] = load <2 x i64>, <2 x i64> [[TMP8]], align 2
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, i32 [[R:%.*]], i64 [[TMP11]]			; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, i32 [[R:%.*]], i64 [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
	; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP12]]			; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0
	; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP13]]			; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1
	; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP14]]			; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP14]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%p0 = getelementptr inbounds i64, i64* %p, i64 0			%p0 = getelementptr inbounds i64, i64* %p, i64 0
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

	Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, i64 [[P]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr inbounds i64, i64 [[P]], i64 3
	; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, i64 [[Q:%.*]], i64 0			; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, i64 [[Q:%.*]], i64 0
	; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 1			; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 1
	; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 2			; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 2
	; CHECK-NEXT: [[Q3:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 3			; CHECK-NEXT: [[Q3:%.]] = getelementptr inbounds i64, i64 [[Q]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[P0]] to <2 x i64>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[P0]] to <2 x i64>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 2			; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 2
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[P2]] to <2 x i64>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[Q0]] to <2 x i64>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> [[TMP3]], align 2			; CHECK-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> [[TMP3]], align 2
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[Q0]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 2			; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[P2]] to <2 x i64>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[Q2]] to <2 x i64>*			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 2
	; CHECK-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> [[TMP7]], align 2			; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[Q2]] to <2 x i64>*
	; CHECK-NEXT: [[TMP9:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP6]]			; CHECK-NEXT: [[TMP9:%.]] = load <2 x i64>, <2 x i64> [[TMP8]], align 2
	; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, i32 [[R:%.*]], i64 [[TMP11]]			; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, i32 [[R:%.*]], i64 [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
	; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP12]]			; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0
	; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP13]]			; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1
	; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP14]]			; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds i32, i32 [[R]], i64 [[TMP14]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%p0 = getelementptr inbounds i64, i64* %p, i64 0			%p0 = getelementptr inbounds i64, i64* %p, i64 0
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0		; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0		; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0		; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0		; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1		; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2		; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2		; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1		; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
		; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
		; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8		; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8		; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8		; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8		; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8		; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8		; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]		; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*		; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8		; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8		; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%IdxA0 = getelementptr inbounds double, double* %A, i64 0		%IdxA0 = getelementptr inbounds double, double* %A, i64 0
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0		; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0		; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0		; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0		; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1		; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2		; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2		; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1		; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
		; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
		; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8		; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8		; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8		; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8		; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8		; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8		; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]		; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*		; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8		; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8		; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
; CHECK-NEXT: store double [[TMP12]], double* [[EXT2:%.*]], align 8		; CHECK-NEXT: store double [[TMP12]], double* [[EXT2:%.*]], align 8
; CHECK-NEXT: store double [[TMP12]], double* [[EXT3:%.*]], align 8		; CHECK-NEXT: store double [[TMP12]], double* [[EXT3:%.*]], align 8
; CHECK-NEXT: store double [[B1]], double* [[EXT4:%.*]], align 8		; CHECK-NEXT: store double [[B1]], double* [[EXT4:%.*]], align 8
; CHECK-NEXT: store double [[B1]], double* [[EXT5:%.*]], align 8		; CHECK-NEXT: store double [[B1]], double* [[EXT5:%.*]], align 8
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
%Class = type { i8 }		%Class = type { i8 }
declare double @_ZN1i2ayEv(%Class*)		declare double @_ZN1i2ayEv(%Class*)
declare double @_ZN1i2axEv()		declare double @_ZN1i2axEv()

define void @lookahead_crash(double* %A, double %S, %Class %Arg0) {		define void @lookahead_crash(double* %A, double %S, %Class %Arg0) {
; CHECK-LABEL: @lookahead_crash(		; CHECK-LABEL: @lookahead_crash(
; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0		; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1		; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
		; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
		; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDXA0]] to <2 x double>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDXA0]] to <2 x double>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
; CHECK-NEXT: [[C0:%.]] = call double @_ZN1i2ayEv(%Class [[ARG0:%.*]])		; CHECK-NEXT: [[C0:%.]] = call double @_ZN1i2ayEv(%Class [[ARG0:%.*]])
; CHECK-NEXT: [[C1:%.*]] = call double @_ZN1i2axEv()		; CHECK-NEXT: [[C1:%.*]] = call double @_ZN1i2axEv()
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[C1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[C1]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDXS0]] to <2 x double>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDXS0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%IdxA0 = getelementptr inbounds double, double* %A, i64 0		%IdxA0 = getelementptr inbounds double, double* %A, i64 0
%IdxA1 = getelementptr inbounds double, double* %A, i64 1		%IdxA1 = getelementptr inbounds double, double* %A, i64 1

%A0 = load double, double *%IdxA0, align 8		%A0 = load double, double *%IdxA0, align 8
Show All 16 Lines
define void @ChecksExtractScores(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2) {		define void @ChecksExtractScores(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2) {
; CHECK-LABEL: @ChecksExtractScores(		; CHECK-LABEL: @ChecksExtractScores(
; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0		; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1		; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
; CHECK-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4		; CHECK-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4
; CHECK-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4		; CHECK-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4
; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4		; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4		; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
		; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
		; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[LOADA0]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[LOADA0]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[LOADVEC]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[LOADVEC]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[LOADA1]], i32 1		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[LOADA1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[LOADVEC2]], [[TMP5]]		; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[LOADVEC2]], [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP3]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[SIDX0]] to <2 x double>*		; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8		; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%idx0 = getelementptr inbounds double, double* %array, i64 0		%idx0 = getelementptr inbounds double, double* %array, i64 0
%idx1 = getelementptr inbounds double, double* %array, i64 1		%idx1 = getelementptr inbounds double, double* %array, i64 1
%loadA0 = load double, double* %idx0, align 4		%loadA0 = load double, double* %idx0, align 4
%loadA1 = load double, double* %idx1, align 4		%loadA1 = load double, double* %idx1, align 4
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	;
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}

; Same as @ChecksExtractScores, but the extratelement vector operands do not match.		; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {		define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
; CHECK-LABEL: @ChecksExtractScores_different_vectors(		; CHECK-LABEL: @ChecksExtractScores_different_vectors(
; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0		; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1		; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4		; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4		; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0		; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1		; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4		; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4		; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0		; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1		; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
		; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
		; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]		; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]
; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP8]]		; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP8]]
; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*		; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8		; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%idx0 = getelementptr inbounds double, double* %array, i64 0		%idx0 = getelementptr inbounds double, double* %array, i64 0
%idx1 = getelementptr inbounds double, double* %array, i64 1		%idx1 = getelementptr inbounds double, double* %array, i64 1
%loadA0 = load double, double* %idx0, align 4		%loadA0 = load double, double* %idx0, align 4
%loadA1 = load double, double* %idx1, align 4		%loadA1 = load double, double* %idx1, align 4
Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/metadata.ll

Show All 28 Lines	entry:
%arrayidx5 = getelementptr inbounds double, double* %c, i64 1		%arrayidx5 = getelementptr inbounds double, double* %c, i64 1
store double %mul5, double* %arrayidx5, align 8, !tbaa !4		store double %mul5, double* %arrayidx5, align 8, !tbaa !4
ret void		ret void
}		}

define void @test2(double* %a, double* %b, i8* %e) {		define void @test2(double* %a, double* %b, i8* %e) {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[C:%.]] = bitcast i8 [[E:%.]] to double
; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>
; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8, !tbaa [[TBAA0]]		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8, !tbaa [[TBAA0]]
; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B:%.]] to <2 x double>		; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[B:%.]] to <2 x double>
; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8, !tbaa [[TBAA0]]		; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8, !tbaa [[TBAA0]]
; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]], !fpmath !5		; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]], !fpmath !5
; CHECK-NEXT: [[C:%.]] = bitcast i8 [[E:%.]] to double
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[C]] to <2 x double>*		; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[C]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8, !tbaa [[TBAA0]]		; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8, !tbaa [[TBAA0]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%i0 = load double, double* %a, align 8, !tbaa !4		%i0 = load double, double* %a, align 8, !tbaa !4
%i1 = load double, double* %b, align 8, !tbaa !4		%i1 = load double, double* %b, align 8, !tbaa !4
%mul = fmul double %i0, %i1, !fpmath !1		%mul = fmul double %i0, %i1, !fpmath !1
Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/multi_block.ll

	Show All 22 Lines
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = fptrunc <2 x double> [[TMP2]] to <2 x float>			; CHECK-NEXT: [[TMP3:%.*]] = fptrunc <2 x double> [[TMP2]] to <2 x float>
	; CHECK-NEXT: [[TMP4:%.]] = icmp eq i32 [[D:%.]], 0			; CHECK-NEXT: [[TMP4:%.]] = icmp eq i32 [[D:%.]], 0
	; CHECK-NEXT: br i1 [[TMP4]], label [[TMP7:%.]], label [[TMP5:%.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[TMP7:%.]], label [[TMP5:%.]]
	; CHECK: 5:			; CHECK: 5:
	; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 (...) @foo()			; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 (...) @foo()
	; CHECK-NEXT: br label [[TMP7]]			; CHECK-NEXT: br label [[TMP7]]
	; CHECK: 7:			; CHECK: 7:
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP3]], <float 4.000000e+00, float 5.000000e+00>			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds double, double [[A]], i64 8
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[A]], i64 8			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP3]], <float 4.000000e+00, float 5.000000e+00>
	; CHECK-NEXT: [[TMP10:%.*]] = fpext <2 x float> [[TMP8]] to <2 x double>			; CHECK-NEXT: [[TMP10:%.*]] = fpext <2 x float> [[TMP9]] to <2 x double>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 9.000000e+00, double 5.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 9.000000e+00, double 5.000000e+00>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP9]] to <2 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP8]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%1 = load double, double* %A, align 8			%1 = load double, double* %A, align 8
	%2 = getelementptr inbounds double, double* %A, i64 1			%2 = getelementptr inbounds double, double* %A, i64 1
	%3 = load double, double* %2, align 8			%3 = load double, double* %2, align 8
	%4 = fptrunc double %1 to float			%4 = fptrunc double %1 to float
	%5 = fptrunc double %3 to float			%5 = fptrunc double %3 to float
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi_overalignedtype.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	; We purposely over-align f64 to 128bit here.			; We purposely over-align f64 to 128bit here.
	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:128:128-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:128:128-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"
	target triple = "i386-apple-macosx10.9.0"			target triple = "i386-apple-macosx10.9.0"


	define void @test(double* %i1, double* %i2, double* %o) {			define void @test(double* %i1, double* %i2, double* %o) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[I1_0:%.]] = load double, double [[I1:%.*]], align 16			; CHECK-NEXT: [[I1_GEP1:%.]] = getelementptr double, double [[I1:%.*]], i64 1
	; CHECK-NEXT: [[I1_GEP1:%.]] = getelementptr double, double [[I1]], i64 1			; CHECK-NEXT: [[I1_0:%.]] = load double, double [[I1]], align 16
	; CHECK-NEXT: [[I1_1:%.]] = load double, double [[I1_GEP1]], align 16			; CHECK-NEXT: [[I1_1:%.]] = load double, double [[I1_GEP1]], align 16
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[I1_0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[I1_0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[I1_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[I1_1]], i32 1
	; CHECK-NEXT: br i1 undef, label [[THEN:%.]], label [[END:%.]]			; CHECK-NEXT: br i1 undef, label [[THEN:%.]], label [[END:%.]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: [[I2_GEP0:%.]] = getelementptr inbounds double, double [[I2:%.*]], i64 0			; CHECK-NEXT: [[I2_GEP0:%.]] = getelementptr inbounds double, double [[I2:%.*]], i64 0
	; CHECK-NEXT: [[I2_0:%.]] = load double, double [[I2_GEP0]], align 16
	; CHECK-NEXT: [[I2_GEP1:%.]] = getelementptr inbounds double, double [[I2]], i64 1			; CHECK-NEXT: [[I2_GEP1:%.]] = getelementptr inbounds double, double [[I2]], i64 1
				; CHECK-NEXT: [[I2_0:%.]] = load double, double [[I2_GEP0]], align 16
	; CHECK-NEXT: [[I2_1:%.]] = load double, double [[I2_GEP1]], align 16			; CHECK-NEXT: [[I2_1:%.]] = load double, double [[I2_GEP1]], align 16
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[I2_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[I2_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[I2_1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[I2_1]], i32 1
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY:%.]] ], [ [[TMP3]], [[THEN]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY:%.]] ], [ [[TMP3]], [[THEN]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: store double [[TMP5]], double* [[O:%.*]], align 16			; CHECK-NEXT: store double [[TMP5]], double* [[O:%.*]], align 16
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/powof2div.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX

	define void @powof2div_uniform(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c){			define void @powof2div_uniform(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c){
	; CHECK-LABEL: @powof2div_uniform(			; CHECK-LABEL: @powof2div_uniform(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>			; CHECK-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; AVX-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; AVX-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; AVX-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; AVX-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; AVX-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; AVX-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; AVX-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; AVX-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; AVX-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; AVX-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; AVX-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; AVX-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; AVX-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; AVX-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; AVX-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; AVX-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; AVX-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; AVX-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; AVX-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; AVX-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; AVX-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; AVX-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; AVX-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; AVX-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; AVX-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 4, i32 8, i32 16>			; AVX-NEXT: [[TMP5:%.*]] = sdiv <4 x i32> [[TMP4]], <i32 2, i32 4, i32 8, i32 16>
	; AVX-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; AVX-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; AVX-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; AVX-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; AVX-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/powof2mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX

	define void @powof2mul_uniform(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c){			define void @powof2mul_uniform(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c){
	; CHECK-LABEL: @powof2mul_uniform(			; CHECK-LABEL: @powof2mul_uniform(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 2, i32 2, i32 2, i32 2>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	Show All 31 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 -2, i32 -2, i32 -2, i32 -2>			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 -2, i32 -2, i32 -2, i32 -2>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	Show All 31 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 2, i32 4, i32 8, i32 16>			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 2, i32 4, i32 8, i32 16>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	Show All 31 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i32, i32 [[B]], i64 2
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 3
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
				; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i32, i32 [[C]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[C]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 -2, i32 -4, i32 -8, i32 -16>			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], <i32 -2, i32 -4, i32 -8, i32 -16>
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%1 = load i32, i32* %c, align 4			%1 = load i32, i32* %c, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[A]], i64 2			; AVX-NEXT: [[GEP2:%.]] = getelementptr inbounds i64, i64 [[A]], i64 2
	; AVX-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[A]], i64 3			; AVX-NEXT: [[GEP3:%.]] = getelementptr inbounds i64, i64 [[A]], i64 3
	; AVX-NEXT: [[GEP4:%.]] = getelementptr inbounds i64, i64 [[A]], i64 4			; AVX-NEXT: [[GEP4:%.]] = getelementptr inbounds i64, i64 [[A]], i64 4
	; AVX-NEXT: [[GEP5:%.]] = getelementptr inbounds i64, i64 [[A]], i64 5			; AVX-NEXT: [[GEP5:%.]] = getelementptr inbounds i64, i64 [[A]], i64 5
	; AVX-NEXT: [[GEP6:%.]] = getelementptr inbounds i64, i64 [[A]], i64 6			; AVX-NEXT: [[GEP6:%.]] = getelementptr inbounds i64, i64 [[A]], i64 6
	; AVX-NEXT: [[GEP7:%.]] = getelementptr inbounds i64, i64 [[A]], i64 7			; AVX-NEXT: [[GEP7:%.]] = getelementptr inbounds i64, i64 [[A]], i64 7
	; AVX-NEXT: [[TMP0:%.]] = bitcast i64 [[A]] to <4 x i64>*			; AVX-NEXT: [[TMP0:%.]] = bitcast i64 [[A]] to <4 x i64>*
	; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8			; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8
	; AVX-NEXT: [[TMP2:%.]] = bitcast i64 [[GEP4]] to <4 x i64>*			; AVX-NEXT: [[TMP2:%.*]] = mul <4 x i64> [[TMP1]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>
	; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8			; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP2]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>
	; AVX-NEXT: [[TMP4:%.*]] = mul <4 x i64> [[TMP1]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>			; AVX-NEXT: [[TMP4:%.]] = bitcast i64 [[A]] to <4 x i64>*
	; AVX-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP3]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>			; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* [[TMP4]], align 8
	; AVX-NEXT: [[TMP6:%.*]] = add <4 x i64> [[TMP4]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>			; AVX-NEXT: [[TMP5:%.]] = bitcast i64 [[GEP4]] to <4 x i64>*
	; AVX-NEXT: [[TMP7:%.*]] = add <4 x i64> [[TMP5]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>			; AVX-NEXT: [[TMP6:%.]] = load <4 x i64>, <4 x i64> [[TMP5]], align 8
	; AVX-NEXT: [[TMP8:%.]] = bitcast i64 [[A]] to <4 x i64>*			; AVX-NEXT: [[TMP7:%.*]] = mul <4 x i64> [[TMP6]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>
	; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* [[TMP8]], align 8			; AVX-NEXT: [[TMP8:%.*]] = add <4 x i64> [[TMP7]], <i64 -17592186044416, i64 -17592186044416, i64 -17592186044416, i64 -17592186044416>
	; AVX-NEXT: [[TMP9:%.]] = bitcast i64 [[GEP4]] to <4 x i64>*			; AVX-NEXT: [[TMP9:%.]] = bitcast i64 [[GEP4]] to <4 x i64>*
	; AVX-NEXT: store <4 x i64> [[TMP7]], <4 x i64>* [[TMP9]], align 8			; AVX-NEXT: store <4 x i64> [[TMP8]], <4 x i64>* [[TMP9]], align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%gep1 = getelementptr inbounds i64, i64* %a, i64 1			%gep1 = getelementptr inbounds i64, i64* %a, i64 1
	%gep2 = getelementptr inbounds i64, i64* %a, i64 2			%gep2 = getelementptr inbounds i64, i64* %a, i64 2
	%gep3 = getelementptr inbounds i64, i64* %a, i64 3			%gep3 = getelementptr inbounds i64, i64* %a, i64 3
	%gep4 = getelementptr inbounds i64, i64* %a, i64 4			%gep4 = getelementptr inbounds i64, i64* %a, i64 4
	%gep5 = getelementptr inbounds i64, i64* %a, i64 5			%gep5 = getelementptr inbounds i64, i64* %a, i64 5
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show All 29 Lines
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @_ZN1C10SwitchModeEv(			; AVX-LABEL: @_ZN1C10SwitchModeEv(
	; AVX-NEXT: for.body.lr.ph.i:			; AVX-NEXT: for.body.lr.ph.i:
	; AVX-NEXT: [[OR_1:%.*]] = or i64 undef, 1			; AVX-NEXT: [[OR_1:%.*]] = or i64 undef, 1
	; AVX-NEXT: store i64 [[OR_1]], i64* undef, align 8			; AVX-NEXT: store i64 [[OR_1]], i64* undef, align 8
	; AVX-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0			; AVX-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0
	; AVX-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1			; AVX-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1
				; AVX-NEXT: [[BAR5:%.]] = load i64, i64 undef, align 8
				; AVX-NEXT: [[BAR3:%.]] = getelementptr inbounds [[CLASS_2:%.]], %class.2* undef, i64 0, i32 0, i32 0, i32 0, i64 0
				; AVX-NEXT: [[BAR4:%.]] = getelementptr inbounds [[CLASS_2]], %class.2 undef, i64 0, i32 0, i32 0, i32 0, i64 1
	; AVX-NEXT: [[TMP0:%.]] = bitcast i64 [[FOO_1]] to <2 x i64>*			; AVX-NEXT: [[TMP0:%.]] = bitcast i64 [[FOO_1]] to <2 x i64>*
	; AVX-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8			; AVX-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8
	; AVX-NEXT: [[BAR5:%.]] = load i64, i64 undef, align 8
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[OR_1]], i32 0			; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[OR_1]], i32 0
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[BAR5]], i32 1			; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[BAR5]], i32 1
	; AVX-NEXT: [[TMP4:%.*]] = and <2 x i64> [[TMP3]], [[TMP1]]			; AVX-NEXT: [[TMP4:%.*]] = and <2 x i64> [[TMP3]], [[TMP1]]
	; AVX-NEXT: [[BAR3:%.]] = getelementptr inbounds [[CLASS_2:%.]], %class.2* undef, i64 0, i32 0, i32 0, i32 0, i64 0
	; AVX-NEXT: [[BAR4:%.]] = getelementptr inbounds [[CLASS_2]], %class.2 undef, i64 0, i32 0, i32 0, i32 0, i64 1
	; AVX-NEXT: [[TMP5:%.]] = bitcast i64 [[BAR3]] to <2 x i64>*			; AVX-NEXT: [[TMP5:%.]] = bitcast i64 [[BAR3]] to <2 x i64>*
	; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 8			; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	for.body.lr.ph.i:			for.body.lr.ph.i:
	%or.1 = or i64 undef, 1			%or.1 = or i64 undef, 1
	store i64 %or.1, i64* undef, align 8			store i64 %or.1, i64* undef, align 8
	%foo.1 = getelementptr inbounds %class.1, %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0			%foo.1 = getelementptr inbounds %class.1, %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0
	Show All 13 Lines
	; Function Attrs: norecurse nounwind uwtable			; Function Attrs: norecurse nounwind uwtable
	define void @pr35497() local_unnamed_addr #0 {			define void @pr35497() local_unnamed_addr #0 {
	; SSE-LABEL: @pr35497(			; SSE-LABEL: @pr35497(
	; SSE-NEXT: entry:			; SSE-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1			; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
	; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef			; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1			; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
	; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5			; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
				; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
				; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
				; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
	; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
	; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; SSE-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1			; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 1
	; SSE-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>			; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
	; SSE-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>			; SSE-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP7]], i64 [[ADD]], i32 1
	; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; SSE-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP8]], <i64 2, i64 2>
	; SSE-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; SSE-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], <i64 20, i64 20>
	; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
	; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]			; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]
	; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1			; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @pr35497(			; AVX-LABEL: @pr35497(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1			; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
	; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef			; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
	; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1			; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
	; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5			; AVX-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
				; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
				; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
				; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1			; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>			; AVX-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>			; AVX-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>
	; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
	; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer			; AVX-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer
	; AVX-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
	; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1			; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0			; AVX-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[ADD]], i32 1			; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = shl <2 x i64> [[TMP7]], <i64 2, i64 2>			; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
	; AVX-NEXT: [[TMP9:%.*]] = and <2 x i64> [[TMP8]], <i64 20, i64 20>			; AVX-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP7]], i64 [[ADD]], i32 1
	; AVX-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0			; AVX-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP8]], <i64 2, i64 2>
	; AVX-NEXT: [[TMP10:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*			; AVX-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], <i64 20, i64 20>
	; AVX-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP10]], align 1
	; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>			; AVX-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
	; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP9]], [[TMP11]]			; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]
	; AVX-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*			; AVX-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
	; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1			; AVX-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* undef, align 1			%0 = load i64, i64* undef, align 1
	%and = shl i64 %0, 2			%and = shl i64 %0, 2
	%shl = and i64 %and, 20			%shl = and i64 %and, 20
	Show All 26 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4		; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_2(		; AVX-LABEL: @gather_load_2(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i64 1		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i64 2		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3		; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>		; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_2(		; AVX2-LABEL: @gather_load_2(
; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i64 1		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i64 2		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3		; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_2(		; AVX512F-LABEL: @gather_load_2(
; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4		; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_3(		; AVX-LABEL: @gather_load_3(
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i64 0		; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0
; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i64 1		; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1
; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i64 2		; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2
; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i64 3		; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3
; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i64 4		; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4
; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i64 5		; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5
; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i64 6		; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6
; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7		; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7
; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>		; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>		; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_3(		; AVX2-LABEL: @gather_load_3(
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX2-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i64 0		; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0
; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i64 1		; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1
; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i64 2		; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2
; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i64 3		; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3
; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i64 4		; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4
; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i64 5		; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5
; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i64 6		; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6
; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7		; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7
; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>		; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_3(		; AVX512F-LABEL: @gather_load_3(
; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
Show All 36 Lines
; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX512VL-LABEL: @gather_load_3(		; AVX512VL-LABEL: @gather_load_3(
; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0		; AVX512VL-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5
; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer		; AVX512VL-NEXT: [[TMP7:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0
; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP7]], <4 x i32*> poison, <4 x i32> zeroinitializer
; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>		; AVX512VL-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX512VL-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2		; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2
; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP18:%.*]] = add i32 [[TMP17]], 3		; AVX512VL-NEXT: [[TMP18:%.*]] = add i32 [[TMP17]], 3
; AVX512VL-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7		; AVX512VL-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX512VL-NEXT: store i32 [[TMP18]], i32* [[TMP15]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP18]], i32* [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX512VL-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP22:%.*]] = add i32 [[TMP21]], 4		; AVX512VL-NEXT: [[TMP22:%.*]] = add i32 [[TMP21]], 4
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX512VL-LABEL: @gather_load_4(		; AVX512VL-LABEL: @gather_load_4(
; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0
; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5		; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9		; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 9
; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6		; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7		; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
; AVX512VL-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
		; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1]], i64 0
		; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
		; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
		; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
		; AVX512VL-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <4 x i32>*
; AVX512VL-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: ret void		; AVX512VL-NEXT: ret void
;		;
%t5 = getelementptr inbounds i32, i32* %t0, i64 1		%t5 = getelementptr inbounds i32, i32* %t0, i64 1
Show All 39 Lines	;
store i32 %t32, i32* %t29, align 4, !tbaa !2		store i32 %t32, i32* %t29, align 4, !tbaa !2

ret void		ret void
}		}


define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {		define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_div(		; SSE-LABEL: @gather_load_div(
; SSE-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; SSE-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; SSE-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; SSE-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; SSE-NEXT: [[TMP12:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; SSE-NEXT: [[TMP14:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; SSE-NEXT: [[TMP16:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP18:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i64 0		; SSE-NEXT: [[TMP18:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> [[TMP18]], float [[TMP7]], i64 1		; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> poison, float [[TMP11]], i64 0
; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i64 2		; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP13]], i64 1
; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i64 3		; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i64 2
; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP5]], i64 0		; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> [[TMP21]], float [[TMP17]], i64 3
; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP9]], i64 1		; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> poison, float [[TMP12]], i64 0
; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP13]], i64 2		; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP14]], i64 1
; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP17]], i64 3		; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP16]], i64 2
; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]		; SSE-NEXT: [[TMP26:%.*]] = insertelement <4 x float> [[TMP25]], float [[TMP18]], i64 3
; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4		; SSE-NEXT: [[TMP27:%.*]] = fdiv <4 x float> [[TMP22]], [[TMP26]]
; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*		; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store <4 x float> [[TMP27]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; SSE-NEXT: [[TMP34:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP34:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; SSE-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; SSE-NEXT: [[TMP37:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; SSE-NEXT: [[TMP37:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP37]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; SSE-NEXT: [[TMP39:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; SSE-NEXT: [[TMP41:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP34]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; SSE-NEXT: [[TMP43:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP43]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP36]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i64 0		; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP37]], i64 0
; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP34]], i64 1		; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP39]], i64 1
; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP38]], i64 2		; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP41]], i64 2
; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP42]], i64 3		; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP43]], i64 3
; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i64 0		; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP38]], i64 0
; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP36]], i64 1		; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP40]], i64 1
; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP40]], i64 2		; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP42]], i64 2
; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i64 3		; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i64 3
; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]		; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]
; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP27]] to <4 x float>*		; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP10]] to <4 x float>*
; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_div(		; AVX-LABEL: @gather_load_div(
; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; AVX-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; AVX-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX-NEXT: [[TMP18:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP19:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; AVX-NEXT: [[TMP20:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP21:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX-NEXT: [[TMP22:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP23:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; AVX-NEXT: [[TMP24:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP25:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX-NEXT: [[TMP26:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP27:%.]] = load float, float [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; AVX-NEXT: [[TMP28:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP29:%.]] = load float, float [[TMP13]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX-NEXT: [[TMP30:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP31:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; AVX-NEXT: [[TMP32:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP33:%.]] = load float, float [[TMP17]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i64 0		; AVX-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i64 1		; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP20]], i64 1
; AVX-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i64 2		; AVX-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP22]], i64 2
; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i64 3		; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP24]], i64 3
; AVX-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i64 4		; AVX-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP26]], i64 4
; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i64 5		; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP28]], i64 5
; AVX-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i64 6		; AVX-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP30]], i64 6
; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i64 7		; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP32]], i64 7
; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i64 0		; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP19]], i64 0
; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i64 1		; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP21]], i64 1
; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i64 2		; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP23]], i64 2
; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i64 3		; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP25]], i64 3
; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i64 4		; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP27]], i64 4
; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i64 5		; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP29]], i64 5
; AVX-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6		; AVX-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP31]], i64 6
; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7		; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
; AVX-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]		; AVX-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
; AVX-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_div(		; AVX2-LABEL: @gather_load_div(
; AVX2-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX2-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX2-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; AVX2-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX2-NEXT: [[TMP18:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX2-NEXT: [[TMP22:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; AVX2-NEXT: [[TMP24:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX2-NEXT: [[TMP26:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP13]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; AVX2-NEXT: [[TMP32:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP17]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i64 0		; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i64 1		; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP20]], i64 1
; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i64 2		; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP22]], i64 2
; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i64 3		; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP24]], i64 3
; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i64 4		; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP26]], i64 4
; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i64 5		; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP28]], i64 5
; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i64 6		; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP30]], i64 6
; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i64 7		; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP32]], i64 7
; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i64 0		; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP19]], i64 0
; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i64 1		; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP21]], i64 1
; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i64 2		; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP23]], i64 2
; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i64 3		; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP25]], i64 3
; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i64 4		; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP27]], i64 4
; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i64 5		; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP29]], i64 5
; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6		; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP31]], i64 6
; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7		; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]		; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_div(		; AVX512F-LABEL: @gather_load_div(
; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0		; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP15]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4		; SSE-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP16]], 4
; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP17]], i32* [[TMP14]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_2(		; AVX-LABEL: @gather_load_2(
; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0		; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i64 1		; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i64 2		; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3		; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>		; AVX-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_2(		; AVX2-LABEL: @gather_load_2(
; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
; AVX2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10
; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 10		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3
; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5
; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 3		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 5		; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0		; AVX2-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP6]], i64 1		; AVX2-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP8]], i64 2		; AVX2-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3		; AVX2-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP15:%.*]] = add nsw <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>		; AVX2-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0:%.]] to <4 x i32>
; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_2(		; AVX512F-LABEL: @gather_load_2(
; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1		; AVX512F-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 1
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4		; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_3(		; AVX-LABEL: @gather_load_3(
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i64 0		; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0
; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i64 1		; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1
; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i64 2		; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2
; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i64 3		; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3
; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i64 4		; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4
; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i64 5		; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5
; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i64 6		; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6
; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7		; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7
; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>		; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>		; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_3(		; AVX2-LABEL: @gather_load_3(
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 11		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
; AVX2-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18		; AVX2-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX2-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i64 0		; AVX2-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0
; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP5]], i64 1		; AVX2-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1
; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP7]], i64 2		; AVX2-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2
; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP9]], i64 3		; AVX2-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3
; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP11]], i64 4		; AVX2-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4
; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP13]], i64 5		; AVX2-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5
; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP15]], i64 6		; AVX2-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6
; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7		; AVX2-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7
; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>		; AVX2-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>		; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_3(		; AVX512F-LABEL: @gather_load_3(
; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
Show All 36 Lines
; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX512VL-LABEL: @gather_load_3(		; AVX512VL-LABEL: @gather_load_3(
; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1		; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1
; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1		; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0		; AVX512VL-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5
; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer		; AVX512VL-NEXT: [[TMP7:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0
; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>		; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP7]], <4 x i32*> poison, <4 x i32> zeroinitializer
; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>		; AVX512VL-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5		; AVX512VL-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*		; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9		; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2		; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2
; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6		; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6		; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP18:%.*]] = add i32 [[TMP17]], 3		; AVX512VL-NEXT: [[TMP18:%.*]] = add i32 [[TMP17]], 3
; AVX512VL-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7		; AVX512VL-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
; AVX512VL-NEXT: store i32 [[TMP18]], i32* [[TMP15]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[TMP18]], i32* [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21		; AVX512VL-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
; AVX512VL-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP22:%.*]] = add i32 [[TMP21]], 4		; AVX512VL-NEXT: [[TMP22:%.*]] = add i32 [[TMP21]], 4
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]		; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
; AVX512F-NEXT: ret void		; AVX512F-NEXT: ret void
;		;
; AVX512VL-LABEL: @gather_load_4(		; AVX512VL-LABEL: @gather_load_4(
; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1		; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0
; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5		; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9		; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 9
; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6		; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6		; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7		; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21		; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1		; AVX512VL-NEXT: [[T4:%.*]] = add i32 [[T3]], 1
; AVX512VL-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2		; AVX512VL-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3		; AVX512VL-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4		; AVX512VL-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
		; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1]], i64 0
		; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
		; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
		; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
		; AVX512VL-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP3]], <i32 2, i32 3, i32 4, i32 1>
; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <4 x i32>*		; AVX512VL-NEXT: [[TMP5:%.]] = bitcast i32 [[T5]] to <4 x i32>*
; AVX512VL-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]		; AVX512VL-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
; AVX512VL-NEXT: ret void		; AVX512VL-NEXT: ret void
;		;
%t5 = getelementptr inbounds i32, i32* %t0, i64 1		%t5 = getelementptr inbounds i32, i32* %t0, i64 1
Show All 39 Lines	;
store i32 %t32, i32* %t29, align 4, !tbaa !2		store i32 %t32, i32* %t29, align 4, !tbaa !2

ret void		ret void
}		}


define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {		define void @gather_load_div(float* noalias nocapture %0, float* noalias nocapture readonly %1) {
; SSE-LABEL: @gather_load_div(		; SSE-LABEL: @gather_load_div(
; SSE-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; SSE-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; SSE-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; SSE-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; SSE-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; SSE-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4
; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP11:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; SSE-NEXT: [[TMP12:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP13:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; SSE-NEXT: [[TMP14:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP15:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; SSE-NEXT: [[TMP16:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP17:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP18:%.*]] = insertelement <4 x float> poison, float [[TMP3]], i64 0		; SSE-NEXT: [[TMP18:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> [[TMP18]], float [[TMP7]], i64 1		; SSE-NEXT: [[TMP19:%.*]] = insertelement <4 x float> poison, float [[TMP11]], i64 0
; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i64 2		; SSE-NEXT: [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP13]], i64 1
; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i64 3		; SSE-NEXT: [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float [[TMP15]], i64 2
; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP5]], i64 0		; SSE-NEXT: [[TMP22:%.*]] = insertelement <4 x float> [[TMP21]], float [[TMP17]], i64 3
; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP9]], i64 1		; SSE-NEXT: [[TMP23:%.*]] = insertelement <4 x float> poison, float [[TMP12]], i64 0
; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP13]], i64 2		; SSE-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP14]], i64 1
; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP17]], i64 3		; SSE-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP16]], i64 2
; SSE-NEXT: [[TMP26:%.*]] = fdiv <4 x float> [[TMP21]], [[TMP25]]		; SSE-NEXT: [[TMP26:%.*]] = insertelement <4 x float> [[TMP25]], float [[TMP18]], i64 3
; SSE-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[TMP0:%.*]], i64 4		; SSE-NEXT: [[TMP27:%.*]] = fdiv <4 x float> [[TMP22]], [[TMP26]]
; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*		; SSE-NEXT: [[TMP28:%.]] = bitcast float [[TMP0]] to <4 x float>*
; SSE-NEXT: store <4 x float> [[TMP26]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store <4 x float> [[TMP27]], <4 x float>* [[TMP28]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; SSE-NEXT: [[TMP29:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; SSE-NEXT: [[TMP30:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; SSE-NEXT: [[TMP32:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; SSE-NEXT: [[TMP34:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP34:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; SSE-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; SSE-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; SSE-NEXT: [[TMP37:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; SSE-NEXT: [[TMP37:%.]] = load float, float [[TMP29]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP37]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP38:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP39:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; SSE-NEXT: [[TMP39:%.]] = load float, float [[TMP31]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP39]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP40:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; SSE-NEXT: [[TMP41:%.]] = load float, float [[TMP33]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP42:%.]] = load float, float [[TMP34]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; SSE-NEXT: [[TMP43:%.]] = load float, float [[TMP35]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP43]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: [[TMP44:%.]] = load float, float [[TMP36]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP30]], i64 0		; SSE-NEXT: [[TMP45:%.*]] = insertelement <4 x float> poison, float [[TMP37]], i64 0
; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP34]], i64 1		; SSE-NEXT: [[TMP46:%.*]] = insertelement <4 x float> [[TMP45]], float [[TMP39]], i64 1
; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP38]], i64 2		; SSE-NEXT: [[TMP47:%.*]] = insertelement <4 x float> [[TMP46]], float [[TMP41]], i64 2
; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP42]], i64 3		; SSE-NEXT: [[TMP48:%.*]] = insertelement <4 x float> [[TMP47]], float [[TMP43]], i64 3
; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP32]], i64 0		; SSE-NEXT: [[TMP49:%.*]] = insertelement <4 x float> poison, float [[TMP38]], i64 0
; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP36]], i64 1		; SSE-NEXT: [[TMP50:%.*]] = insertelement <4 x float> [[TMP49]], float [[TMP40]], i64 1
; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP40]], i64 2		; SSE-NEXT: [[TMP51:%.*]] = insertelement <4 x float> [[TMP50]], float [[TMP42]], i64 2
; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i64 3		; SSE-NEXT: [[TMP52:%.*]] = insertelement <4 x float> [[TMP51]], float [[TMP44]], i64 3
; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]		; SSE-NEXT: [[TMP53:%.*]] = fdiv <4 x float> [[TMP48]], [[TMP52]]
; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP27]] to <4 x float>*		; SSE-NEXT: [[TMP54:%.]] = bitcast float [[TMP10]] to <4 x float>*
; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]		; SSE-NEXT: store <4 x float> [[TMP53]], <4 x float>* [[TMP54]], align 4, !tbaa [[TBAA0]]
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @gather_load_div(		; AVX-LABEL: @gather_load_div(
; AVX-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; AVX-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; AVX-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX-NEXT: [[TMP18:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP19:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; AVX-NEXT: [[TMP20:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP21:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX-NEXT: [[TMP22:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP23:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; AVX-NEXT: [[TMP24:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP25:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX-NEXT: [[TMP26:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP27:%.]] = load float, float [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; AVX-NEXT: [[TMP28:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP29:%.]] = load float, float [[TMP13]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX-NEXT: [[TMP30:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP31:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; AVX-NEXT: [[TMP32:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: [[TMP33:%.]] = load float, float [[TMP17]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i64 0		; AVX-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i64 1		; AVX-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP20]], i64 1
; AVX-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i64 2		; AVX-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP22]], i64 2
; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i64 3		; AVX-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP24]], i64 3
; AVX-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i64 4		; AVX-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP26]], i64 4
; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i64 5		; AVX-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP28]], i64 5
; AVX-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i64 6		; AVX-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP30]], i64 6
; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i64 7		; AVX-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP32]], i64 7
; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i64 0		; AVX-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP19]], i64 0
; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i64 1		; AVX-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP21]], i64 1
; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i64 2		; AVX-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP23]], i64 2
; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i64 3		; AVX-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP25]], i64 3
; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i64 4		; AVX-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP27]], i64 4
; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i64 5		; AVX-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP29]], i64 5
; AVX-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6		; AVX-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP31]], i64 6
; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7		; AVX-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
; AVX-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]		; AVX-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
; AVX-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]		; AVX-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX2-LABEL: @gather_load_div(		; AVX2-LABEL: @gather_load_div(
; AVX2-NEXT: [[TMP3:%.]] = load float, float [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 4
; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 4		; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10
; AVX2-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13
; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 10		; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3
; AVX2-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11
; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 13		; AVX2-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14
; AVX2-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44
; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 3		; AVX2-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17
; AVX2-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33
; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 11		; AVX2-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8
; AVX2-NEXT: [[TMP13:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30
; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 14		; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5
; AVX2-NEXT: [[TMP15:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27
; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 44		; AVX2-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
; AVX2-NEXT: [[TMP17:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP17:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23
; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[TMP1]], i64 17		; AVX2-NEXT: [[TMP18:%.]] = load float, float [[TMP1]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP18]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP19:%.]] = load float, float [[TMP3]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[TMP1]], i64 33		; AVX2-NEXT: [[TMP20:%.]] = load float, float [[TMP4]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP20]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP21:%.]] = load float, float [[TMP5]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[TMP1]], i64 8		; AVX2-NEXT: [[TMP22:%.]] = load float, float [[TMP6]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP22]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP23:%.]] = load float, float [[TMP7]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP1]], i64 30		; AVX2-NEXT: [[TMP24:%.]] = load float, float [[TMP8]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP24]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP25:%.]] = load float, float [[TMP9]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP1]], i64 5		; AVX2-NEXT: [[TMP26:%.]] = load float, float [[TMP10]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP27:%.]] = load float, float [[TMP11]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP1]], i64 27		; AVX2-NEXT: [[TMP28:%.]] = load float, float [[TMP12]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP28]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP29:%.]] = load float, float [[TMP13]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20		; AVX2-NEXT: [[TMP30:%.]] = load float, float [[TMP14]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP30]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP31:%.]] = load float, float [[TMP15]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[TMP1]], i64 23		; AVX2-NEXT: [[TMP32:%.]] = load float, float [[TMP16]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: [[TMP33:%.]] = load float, float [[TMP17]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP3]], i64 0		; AVX2-NEXT: [[TMP34:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP7]], i64 1		; AVX2-NEXT: [[TMP35:%.*]] = insertelement <8 x float> [[TMP34]], float [[TMP20]], i64 1
; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP11]], i64 2		; AVX2-NEXT: [[TMP36:%.*]] = insertelement <8 x float> [[TMP35]], float [[TMP22]], i64 2
; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP15]], i64 3		; AVX2-NEXT: [[TMP37:%.*]] = insertelement <8 x float> [[TMP36]], float [[TMP24]], i64 3
; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP19]], i64 4		; AVX2-NEXT: [[TMP38:%.*]] = insertelement <8 x float> [[TMP37]], float [[TMP26]], i64 4
; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP23]], i64 5		; AVX2-NEXT: [[TMP39:%.*]] = insertelement <8 x float> [[TMP38]], float [[TMP28]], i64 5
; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP27]], i64 6		; AVX2-NEXT: [[TMP40:%.*]] = insertelement <8 x float> [[TMP39]], float [[TMP30]], i64 6
; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP31]], i64 7		; AVX2-NEXT: [[TMP41:%.*]] = insertelement <8 x float> [[TMP40]], float [[TMP32]], i64 7
; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP5]], i64 0		; AVX2-NEXT: [[TMP42:%.*]] = insertelement <8 x float> poison, float [[TMP19]], i64 0
; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP9]], i64 1		; AVX2-NEXT: [[TMP43:%.*]] = insertelement <8 x float> [[TMP42]], float [[TMP21]], i64 1
; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP13]], i64 2		; AVX2-NEXT: [[TMP44:%.*]] = insertelement <8 x float> [[TMP43]], float [[TMP23]], i64 2
; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP17]], i64 3		; AVX2-NEXT: [[TMP45:%.*]] = insertelement <8 x float> [[TMP44]], float [[TMP25]], i64 3
; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP21]], i64 4		; AVX2-NEXT: [[TMP46:%.*]] = insertelement <8 x float> [[TMP45]], float [[TMP27]], i64 4
; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP25]], i64 5		; AVX2-NEXT: [[TMP47:%.*]] = insertelement <8 x float> [[TMP46]], float [[TMP29]], i64 5
; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6		; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP31]], i64 6
; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7		; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]		; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>		; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]		; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512F-LABEL: @gather_load_div(		; AVX512F-LABEL: @gather_load_div(
; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0		; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0
▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_horcost.ll

	Show All 14 Lines
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP1]], 4			; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP1]], 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP2]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP1]], 5			; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP1]], 5
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP4]]
				; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP1]], 2			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP1]], 2
	; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP5]]			; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP1]], 6			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP1]], 6
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP6]]
				; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP1]], 3			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP1]], 3
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP1]], 7
	; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 4			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = or i64 [[TMP1]], 7			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
				; CHECK-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP10]]
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16
	; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_diamond_match.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 undef, i64 4			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 undef, i64 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 undef, i64 5			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 undef, i64 5
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 undef, i64 6			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 undef, i64 6
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 undef, i64 7			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 undef, i64 7
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP1]] to <4 x i8>*			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 0
	; CHECK-NEXT: [[TMP6:%.]] = load <4 x i8>, <4 x i8> [[TMP5]], align 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 2
	; CHECK-NEXT: [[TMP7:%.*]] = zext <4 x i8> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP7]], <4 x i32> poison, <4 x i32> <i32 1, i32 0, i32 3, i32 2>			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 3
	; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <4 x i32> zeroinitializer, [[SHUFFLE]]			; CHECK-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP1]] to <4 x i8>*
	; CHECK-NEXT: [[TMP9:%.*]] = shl nsw <4 x i32> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.]] = load <4 x i8>, <4 x i8> [[TMP9]], align 1
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <4 x i32> [[TMP9]], zeroinitializer			; CHECK-NEXT: [[TMP11:%.*]] = zext <4 x i8> [[TMP10]] to <4 x i32>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP10]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> poison, <4 x i32> <i32 1, i32 0, i32 3, i32 2>
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> poison, i32 [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = sub nsw <4 x i32> zeroinitializer, [[SHUFFLE]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = shl nsw <4 x i32> [[TMP12]], zeroinitializer
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP13]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = add nsw <4 x i32> [[TMP13]], zeroinitializer
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP10]], i32 3			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i32> [[TMP14]], i32 [[TMP15]], i32 2			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i32> poison, i32 [[TMP15]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[TMP10]], i32 2			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i32> [[TMP16]], i32 [[TMP17]], i32 3			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i32> [[TMP16]], i32 [[TMP17]], i32 1
	; CHECK-NEXT: [[TMP19:%.*]] = add nsw <4 x i32> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3
	; CHECK-NEXT: [[TMP20:%.*]] = sub nsw <4 x i32> [[TMP10]], [[TMP18]]			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP19]], i32 2
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x i32> [[TMP19]], <4 x i32> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 0			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i32> [[TMP20]], i32 [[TMP21]], i32 3
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 2			; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[TMP14]], [[TMP22]]
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 1			; CHECK-NEXT: [[TMP24:%.*]] = sub nsw <4 x i32> [[TMP14]], [[TMP22]]
	; CHECK-NEXT: [[TMP25:%.*]] = add nsw <4 x i32> zeroinitializer, [[TMP21]]			; CHECK-NEXT: [[TMP25:%.*]] = shufflevector <4 x i32> [[TMP23]], <4 x i32> [[TMP24]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CHECK-NEXT: [[TMP26:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP21]]			; CHECK-NEXT: [[TMP26:%.*]] = add nsw <4 x i32> zeroinitializer, [[TMP25]]
	; CHECK-NEXT: [[TMP27:%.*]] = shufflevector <4 x i32> [[TMP25]], <4 x i32> [[TMP26]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>			; CHECK-NEXT: [[TMP27:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP25]]
	; CHECK-NEXT: [[TMP28:%.]] = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]] undef, i64 0, i64 1, i64 3			; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <4 x i32> [[TMP26]], <4 x i32> [[TMP27]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	; CHECK-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP22]] to <4 x i32>*			; CHECK-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP27]], <4 x i32>* [[TMP29]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP28]], <4 x i32>* [[TMP29]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = getelementptr inbounds i8, i8* undef, i64 4			%1 = getelementptr inbounds i8, i8* undef, i64 4
	%2 = load i8, i8* %1, align 1			%2 = load i8, i8* %1, align 1
	%3 = zext i8 %2 to i32			%3 = zext i8 %2 to i32
	%4 = sub nsw i32 0, %3			%4 = sub nsw i32 0, %3
	%5 = shl nsw i32 %4, 0			%5 = shl nsw i32 %4, 0
	%6 = add nsw i32 %5, 0			%6 = add nsw i32 %5, 0
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	%"struct.std::array" = type { [32 x i8] }			%"struct.std::array" = type { [32 x i8] }

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {			define fastcc void @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv() unnamed_addr #0 align 2 {
	; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(			; CHECK-LABEL: @_ZN12_GLOBAL__N_127PolynomialMultiplyRecognize9recognizeEv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END50_I:%.]], label [[IF_THEN22_I:%.]]
	; CHECK: if.then22.i:			; CHECK: if.then22.i:
	; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1
	; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX_I_I7_1_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX_I_I7_2_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 2
	; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX_I_I7_3_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX_I_I7_4_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX_I_I7_5_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX_I_I7_6_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7			; CHECK-NEXT: [[ARRAYIDX_I_I7_7_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8			; CHECK-NEXT: [[ARRAYIDX_I_I7_8_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 8
	; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9			; CHECK-NEXT: [[ARRAYIDX_I_I7_9_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 9
	; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10			; CHECK-NEXT: [[ARRAYIDX_I_I7_10_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 10
	; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11			; CHECK-NEXT: [[ARRAYIDX_I_I7_11_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 11
				; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
				; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
				; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
				; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
				; CHECK-NEXT: [[SUB_I:%.*]] = add nsw i32 undef, -1
				; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[SHUFFLE1]], <i32 9, i32 10, i32 11, i32 12>			; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[SHUFFLE1]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_12_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 12
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13			; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_13_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 13
	; CHECK-NEXT: [[ARRAYIDX_I_I7_14_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 14
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[CONV31_I]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[CONV31_I]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = lshr <2 x i32> [[TMP6]], <i32 14, i32 15>			; CHECK-NEXT: [[TMP7:%.*]] = lshr <2 x i32> [[TMP6]], <i32 14, i32 15>
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[TMP8]], <16 x i32> [[TMP9]], <16 x i32> <i32 0, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[TMP8]], <16 x i32> [[TMP9]], <16 x i32> <i32 0, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 16, i32 17, i32 18, i32 19, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 16, i32 17, i32 18, i32 19, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[SHR_12_I_I]], i32 13			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[SHR_12_I_I]], i32 13
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <16 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <16 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x i32> [[TMP13]], <16 x i32> [[TMP14]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x i32> [[TMP13]], <16 x i32> [[TMP14]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>
	; CHECK-NEXT: [[TMP16:%.*]] = trunc <16 x i32> [[TMP15]] to <16 x i8>			; CHECK-NEXT: [[TMP16:%.*]] = trunc <16 x i32> [[TMP15]] to <16 x i8>
	; CHECK-NEXT: [[TMP17:%.*]] = and <16 x i8> [[TMP16]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; CHECK-NEXT: [[TMP17:%.*]] = and <16 x i8> [[TMP16]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[ARRAYIDX_I_I7_15_I_I:%.]] = getelementptr inbounds %"struct.std::array", %"struct.std::array" undef, i64 0, i32 0, i64 15
	; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: store <16 x i8> [[TMP17]], <16 x i8>* [[TMP18]], align 1			; CHECK-NEXT: store <16 x i8> [[TMP17]], <16 x i8>* [[TMP18]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/return.ll

	Show All 38 Lines
	; return ((x[0] + x[2]) + (x[1] + x[3]));			; return ((x[0] + x[2]) + (x[1] + x[3]));
	; }			; }

	define double @return2(double* nocapture readonly %x) {			define double @return2(double* nocapture readonly %x) {
	; CHECK-LABEL: @return2(			; CHECK-LABEL: @return2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[X:%.*]], i32 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[X:%.*]], i32 2
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[X]], i32 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[X]], i32 1
				; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[X]], i32 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[X]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[X]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[X]], i32 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX1]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX1]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
	; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP5]], [[TMP6]]
	; CHECK-NEXT: ret double [[ADD5]]			; CHECK-NEXT: ret double [[ADD5]]
	;			;
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/reuse-extracts-in-wider-vect.ll

	Show All 10 Lines
	; CHECK-NEXT: [[T9:%.]] = getelementptr inbounds [3 x float], [3 x float] [[T8]], i64 1, i64 0			; CHECK-NEXT: [[T9:%.]] = getelementptr inbounds [3 x float], [3 x float] [[T8]], i64 1, i64 0
	; CHECK-NEXT: [[T14:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 1, i64 0			; CHECK-NEXT: [[T14:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 1, i64 0
	; CHECK-NEXT: [[T11:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 1, i64 1			; CHECK-NEXT: [[T11:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 1, i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[T14]] to <2 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[T14]] to <2 x float>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x float>, <2 x float> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load <2 x float>, <2 x float> [[TMP4]], align 4
	; CHECK-NEXT: br label [[T37:%.*]]			; CHECK-NEXT: br label [[T37:%.*]]
	; CHECK: t37:			; CHECK: t37:
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP5]], [[TMP3:%.]] ], [ [[T89:%.*]], [[T37]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP5]], [[TMP3:%.]] ], [ [[T89:%.*]], [[T37]] ]
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[TMP6]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 0			; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 0
	; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 1			; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 1
	; CHECK-NEXT: [[T31:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 2			; CHECK-NEXT: [[T31:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 2
	; CHECK-NEXT: [[T33:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 3			; CHECK-NEXT: [[T33:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[T4]], i64 0, i32 2, i64 3
				; CHECK-NEXT: [[TMP7:%.*]] = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[TMP6]]
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[T21]] to <4 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[T21]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[SHUFFLE]], <4 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <4 x float> [[SHUFFLE]], <4 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[T88:%.]] = bitcast float [[T9]] to <2 x float>*			; CHECK-NEXT: [[T88:%.]] = bitcast float [[T9]] to <2 x float>*
	; CHECK-NEXT: [[T89]] = load <2 x float>, <2 x float>* [[T88]], align 4			; CHECK-NEXT: [[T89]] = load <2 x float>, <2 x float>* [[T88]], align 4
	; CHECK-NEXT: br i1 undef, label [[T37]], label [[T55:%.*]]			; CHECK-NEXT: br i1 undef, label [[T37]], label [[T55:%.*]]
	; CHECK: t55:			; CHECK: t55:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/schedule_budget.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[B1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1			; CHECK-NEXT: [[B1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 1
	; CHECK-NEXT: [[B2:%.]] = getelementptr inbounds float, float [[B]], i64 2			; CHECK-NEXT: [[B2:%.]] = getelementptr inbounds float, float [[B]], i64 2
	; CHECK-NEXT: [[B3:%.]] = getelementptr inbounds float, float [[B]], i64 3			; CHECK-NEXT: [[B3:%.]] = getelementptr inbounds float, float [[B]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[B]] to <4 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[B]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4
	; CHECK-NEXT: [[C1:%.]] = getelementptr inbounds float, float [[C:%.*]], i64 1			; CHECK-NEXT: [[C1:%.]] = getelementptr inbounds float, float [[C:%.*]], i64 1
	; CHECK-NEXT: [[C2:%.]] = getelementptr inbounds float, float [[C]], i64 2			; CHECK-NEXT: [[C2:%.]] = getelementptr inbounds float, float [[C]], i64 2
	; CHECK-NEXT: [[C3:%.]] = getelementptr inbounds float, float [[C]], i64 3			; CHECK-NEXT: [[C3:%.]] = getelementptr inbounds float, float [[C]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[C]] to <4 x float>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[D1:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 1			; CHECK-NEXT: [[D1:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 1
	; CHECK-NEXT: [[D2:%.]] = getelementptr inbounds float, float [[D]], i64 2			; CHECK-NEXT: [[D2:%.]] = getelementptr inbounds float, float [[D]], i64 2
	; CHECK-NEXT: [[D3:%.]] = getelementptr inbounds float, float [[D]], i64 3			; CHECK-NEXT: [[D3:%.]] = getelementptr inbounds float, float [[D]], i64 3
				; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[C]] to <4 x float>*
				; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[D]] to <4 x float>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[D]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	; Don't vectorize these loads.			; Don't vectorize these loads.
	%l0 = load float, float* %a			%l0 = load float, float* %a
	%a1 = getelementptr inbounds float, float* %a, i64 1			%a1 = getelementptr inbounds float, float* %a, i64 1
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/scheduling.ll

	Show All 13 Lines
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DIFF:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP1]], 4			; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP1]], 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP2]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 0
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP1]], 5			; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP1]], 5
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP4]]
				; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP1]], 2			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP1]], 2
	; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP5]]			; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP1]], 6			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP1]], 6
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP6]]
				; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP1]], 3			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP1]], 3
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP1]], 7
	; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 4			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = or i64 [[TMP1]], 7			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
				; CHECK-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP10]]
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16
	; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])			; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP13]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/shift-ashr.ll

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; AVX1-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8		; AVX1-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
; AVX1-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8		; AVX1-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
; AVX1-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		; AVX1-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
; AVX1-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		; AVX1-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @ashr_v8i64(		; AVX2-LABEL: @ashr_v8i64(
; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP3:%.*]] = ashr <4 x i64> [[TMP1]], [[TMP2]]
; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP5:%.*]] = ashr <4 x i64> [[TMP1]], [[TMP3]]		; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: [[TMP6:%.*]] = ashr <4 x i64> [[TMP2]], [[TMP4]]		; AVX2-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX2-NEXT: [[TMP6:%.*]] = ashr <4 x i64> [[TMP4]], [[TMP5]]
; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX2-NEXT: ret void		; AVX2-NEXT: ret void
;		;
; AVX512-LABEL: @ashr_v8i64(		; AVX512-LABEL: @ashr_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = ashr <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @ashr_v8i64(		; XOP-LABEL: @ashr_v8i64(
; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP3:%.*]] = ashr <4 x i64> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP5:%.*]] = ashr <4 x i64> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: [[TMP6:%.*]] = ashr <4 x i64> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP6:%.*]] = ashr <4 x i64> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8		%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8		%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8		%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8		%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8		%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
Show All 25 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @ashr_v16i32() {		define void @ashr_v16i32() {
; SSE-LABEL: @ashr_v16i32(		; SSE-LABEL: @ashr_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = ashr <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = ashr <4 x i32> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = ashr <4 x i32> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = ashr <4 x i32> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = ashr <4 x i32> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = ashr <4 x i32> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ashr_v16i32(		; AVX-LABEL: @ashr_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = ashr <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = ashr <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = ashr <8 x i32> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = ashr <8 x i32> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @ashr_v16i32(		; AVX512-LABEL: @ashr_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = ashr <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @ashr_v16i32(		; XOP-LABEL: @ashr_v16i32(
; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP3:%.*]] = ashr <8 x i32> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP5:%.*]] = ashr <8 x i32> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: [[TMP6:%.*]] = ashr <8 x i32> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP6:%.*]] = ashr <8 x i32> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4		%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4		%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4
%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4		%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4
%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4		%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4
%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4		%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @ashr_v32i16() {		define void @ashr_v32i16() {
; SSE-LABEL: @ashr_v32i16(		; SSE-LABEL: @ashr_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = ashr <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = ashr <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = ashr <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = ashr <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = ashr <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = ashr <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = ashr <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = ashr <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ashr_v32i16(		; AVX-LABEL: @ashr_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @ashr_v32i16(		; AVX512-LABEL: @ashr_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = ashr <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @ashr_v32i16(		; XOP-LABEL: @ashr_v32i16(
; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP3:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP5:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2		%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2		%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2
%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2		%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2
%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2		%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2
%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2		%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @ashr_v64i8() {		define void @ashr_v64i8() {
; SSE-LABEL: @ashr_v64i8(		; SSE-LABEL: @ashr_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = ashr <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = ashr <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = ashr <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = ashr <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = ashr <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = ashr <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = ashr <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = ashr <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @ashr_v64i8(		; AVX-LABEL: @ashr_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @ashr_v64i8(		; AVX512-LABEL: @ashr_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = ashr <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = ashr <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @ashr_v64i8(		; XOP-LABEL: @ashr_v64i8(
; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP3:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP5:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1		%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1		%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1
%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1		%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1
%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1		%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1
%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1		%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1
▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/shift-lshr.ll

Show All 17 Lines
@c16 = common global [32 x i16] zeroinitializer, align 64		@c16 = common global [32 x i16] zeroinitializer, align 64
@a8 = common global [64 x i8] zeroinitializer, align 64		@a8 = common global [64 x i8] zeroinitializer, align 64
@b8 = common global [64 x i8] zeroinitializer, align 64		@b8 = common global [64 x i8] zeroinitializer, align 64
@c8 = common global [64 x i8] zeroinitializer, align 64		@c8 = common global [64 x i8] zeroinitializer, align 64

define void @lshr_v8i64() {		define void @lshr_v8i64() {
; SSE-LABEL: @lshr_v8i64(		; SSE-LABEL: @lshr_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = lshr <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = lshr <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i64> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = lshr <2 x i64> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = lshr <2 x i64> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = lshr <2 x i64> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = lshr <2 x i64> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @lshr_v8i64(		; AVX-LABEL: @lshr_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = lshr <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = lshr <4 x i64> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = lshr <4 x i64> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = lshr <4 x i64> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @lshr_v8i64(		; AVX512-LABEL: @lshr_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = lshr <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = lshr <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @lshr_v8i64(		; XOP-LABEL: @lshr_v8i64(
; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP3:%.*]] = lshr <4 x i64> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP5:%.*]] = lshr <4 x i64> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: [[TMP6:%.*]] = lshr <4 x i64> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP6:%.*]] = lshr <4 x i64> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8		%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8		%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8		%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8		%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8		%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
Show All 25 Lines	;
store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8		store i64 %r6, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8		store i64 %r7, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
ret void		ret void
}		}

define void @lshr_v16i32() {		define void @lshr_v16i32() {
; SSE-LABEL: @lshr_v16i32(		; SSE-LABEL: @lshr_v16i32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP9:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: [[TMP11:%.*]] = lshr <4 x i32> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = lshr <4 x i32> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = lshr <4 x i32> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4		; SSE-NEXT: [[TMP10:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP11:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4		; SSE-NEXT: [[TMP12:%.*]] = lshr <4 x i32> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4		; SSE-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @lshr_v16i32(		; AVX-LABEL: @lshr_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @lshr_v16i32(		; AVX512-LABEL: @lshr_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = lshr <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = lshr <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @lshr_v16i32(		; XOP-LABEL: @lshr_v16i32(
; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP3:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP5:%.*]] = lshr <8 x i32> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4		%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4		%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4
%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4		%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4
%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4		%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4
%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4		%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @lshr_v32i16() {		define void @lshr_v32i16() {
; SSE-LABEL: @lshr_v32i16(		; SSE-LABEL: @lshr_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = lshr <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = lshr <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = lshr <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = lshr <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = lshr <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = lshr <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = lshr <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @lshr_v32i16(		; AVX-LABEL: @lshr_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = lshr <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = lshr <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @lshr_v32i16(		; AVX512-LABEL: @lshr_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = lshr <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = lshr <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @lshr_v32i16(		; XOP-LABEL: @lshr_v32i16(
; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP3:%.*]] = lshr <16 x i16> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP5:%.*]] = lshr <16 x i16> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2		%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2		%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2
%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2		%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2
%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2		%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2
%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2		%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @lshr_v64i8() {		define void @lshr_v64i8() {
; SSE-LABEL: @lshr_v64i8(		; SSE-LABEL: @lshr_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = lshr <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = lshr <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = lshr <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = lshr <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = lshr <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = lshr <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = lshr <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = lshr <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @lshr_v64i8(		; AVX-LABEL: @lshr_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = lshr <32 x i8> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = lshr <32 x i8> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = lshr <32 x i8> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = lshr <32 x i8> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @lshr_v64i8(		; AVX512-LABEL: @lshr_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = lshr <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = lshr <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @lshr_v64i8(		; XOP-LABEL: @lshr_v64i8(
; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP3:%.*]] = lshr <32 x i8> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP5:%.*]] = lshr <32 x i8> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: [[TMP6:%.*]] = lshr <32 x i8> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP6:%.*]] = lshr <32 x i8> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1		%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1		%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1
%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1		%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1
%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1		%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1
%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1		%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1
▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/shift-shl.ll

Show All 17 Lines
@c16 = common global [32 x i16] zeroinitializer, align 64		@c16 = common global [32 x i16] zeroinitializer, align 64
@a8 = common global [64 x i8] zeroinitializer, align 64		@a8 = common global [64 x i8] zeroinitializer, align 64
@b8 = common global [64 x i8] zeroinitializer, align 64		@b8 = common global [64 x i8] zeroinitializer, align 64
@c8 = common global [64 x i8] zeroinitializer, align 64		@c8 = common global [64 x i8] zeroinitializer, align 64

define void @shl_v8i64() {		define void @shl_v8i64() {
; SSE-LABEL: @shl_v8i64(		; SSE-LABEL: @shl_v8i64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @a64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP3:%.*]] = shl <2 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8
; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @b64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP6:%.*]] = shl <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP10:%.*]] = shl <2 x i64> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: [[TMP11:%.*]] = shl <2 x i64> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = shl <2 x i64> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP9]], <2 x i64>* bitcast ([8 x i64]* @c64 to <2 x i64>*), align 8		; SSE-NEXT: [[TMP10:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP11:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <2 x i64>*), align 8		; SSE-NEXT: [[TMP12:%.*]] = shl <2 x i64> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8		; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6) to <2 x i64>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @shl_v8i64(		; AVX-LABEL: @shl_v8i64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP3:%.*]] = shl <4 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; AVX-NEXT: [[TMP5:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: [[TMP6:%.*]] = shl <4 x i64> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; AVX-NEXT: [[TMP6:%.*]] = shl <4 x i64> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; AVX-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @shl_v8i64(		; AVX512-LABEL: @shl_v8i64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8		; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
; AVX512-NEXT: [[TMP3:%.*]] = shl <8 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = shl <8 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8		; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @shl_v8i64(		; XOP-LABEL: @shl_v8i64(
; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP3:%.*]] = shl <4 x i64> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP3]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
; XOP-NEXT: [[TMP5:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: [[TMP6:%.*]] = shl <4 x i64> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8		; XOP-NEXT: [[TMP6:%.*]] = shl <4 x i64> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8		; XOP-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8		%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8		%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8		%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8		%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8		%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
; SSE-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4		; SSE-NEXT: store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4
; SSE-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4		; SSE-NEXT: store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
; SSE-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		; SSE-NEXT: store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
; SSE-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		; SSE-NEXT: store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @shl_v16i32(		; AVX-LABEL: @shl_v16i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP3:%.*]] = shl <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; AVX-NEXT: [[TMP5:%.*]] = shl <8 x i32> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: [[TMP6:%.*]] = shl <8 x i32> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; AVX-NEXT: [[TMP6:%.*]] = shl <8 x i32> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @shl_v16i32(		; AVX512-LABEL: @shl_v16i32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4		; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
; AVX512-NEXT: [[TMP3:%.*]] = shl <16 x i32> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = shl <16 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4		; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @shl_v16i32(		; XOP-LABEL: @shl_v16i32(
; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP3:%.*]] = shl <8 x i32> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP3]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
; XOP-NEXT: [[TMP5:%.*]] = shl <8 x i32> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: [[TMP6:%.*]] = shl <8 x i32> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4		; XOP-NEXT: [[TMP6:%.*]] = shl <8 x i32> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4		; XOP-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4		%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4		%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4
%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4		%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4
%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4		%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4
%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4		%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4		store i32 %r14, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4		store i32 %r15, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
ret void		ret void
}		}

define void @shl_v32i16() {		define void @shl_v32i16() {
; SSE-LABEL: @shl_v32i16(		; SSE-LABEL: @shl_v32i16(
; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP3:%.*]] = shl <8 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP3]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP6:%.*]] = shl <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP6]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP9:%.*]] = shl <8 x i16> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP10:%.*]] = shl <8 x i16> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: [[TMP11:%.*]] = shl <8 x i16> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = shl <8 x i16> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = shl <8 x i16> [[TMP4]], [[TMP8]]		; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2		; SSE-NEXT: [[TMP10:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP11:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2		; SSE-NEXT: [[TMP12:%.*]] = shl <8 x i16> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2		; SSE-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @shl_v32i16(		; AVX-LABEL: @shl_v32i16(
; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP3:%.*]] = shl <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; AVX-NEXT: [[TMP5:%.*]] = shl <16 x i16> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: [[TMP6:%.*]] = shl <16 x i16> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; AVX-NEXT: [[TMP6:%.*]] = shl <16 x i16> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @shl_v32i16(		; AVX512-LABEL: @shl_v32i16(
; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2		; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
; AVX512-NEXT: [[TMP3:%.*]] = shl <32 x i16> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = shl <32 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2		; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @shl_v32i16(		; XOP-LABEL: @shl_v32i16(
; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP3:%.*]] = shl <16 x i16> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP3]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
; XOP-NEXT: [[TMP5:%.*]] = shl <16 x i16> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: [[TMP6:%.*]] = shl <16 x i16> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2		; XOP-NEXT: [[TMP6:%.*]] = shl <16 x i16> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2		; XOP-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2		%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2		%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2
%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2		%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2
%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2		%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2
%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2		%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	;
store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2		store i16 %r30, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2		store i16 %r31, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
ret void		ret void
}		}

define void @shl_v64i8() {		define void @shl_v64i8() {
; SSE-LABEL: @shl_v64i8(		; SSE-LABEL: @shl_v64i8(
; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP3:%.*]] = shl <16 x i8> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP3]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP6:%.*]] = shl <16 x i8> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP6]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP9:%.*]] = shl <16 x i8> [[TMP1]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP10:%.*]] = shl <16 x i8> [[TMP2]], [[TMP6]]		; SSE-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: [[TMP11:%.*]] = shl <16 x i8> [[TMP3]], [[TMP7]]		; SSE-NEXT: [[TMP9:%.*]] = shl <16 x i8> [[TMP7]], [[TMP8]]
; SSE-NEXT: [[TMP12:%.*]] = shl <16 x i8> [[TMP4]], [[TMP8]]		; SSE-NEXT: [[TMP10:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1		; SSE-NEXT: [[TMP11:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1		; SSE-NEXT: [[TMP12:%.*]] = shl <16 x i8> [[TMP10]], [[TMP11]]
; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1		; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @shl_v64i8(		; AVX-LABEL: @shl_v64i8(
; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP3:%.*]] = shl <32 x i8> [[TMP1]], [[TMP2]]
; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; AVX-NEXT: [[TMP5:%.*]] = shl <32 x i8> [[TMP1]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: [[TMP6:%.*]] = shl <32 x i8> [[TMP2]], [[TMP4]]		; AVX-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; AVX-NEXT: [[TMP6:%.*]] = shl <32 x i8> [[TMP4]], [[TMP5]]
; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
; AVX512-LABEL: @shl_v64i8(		; AVX512-LABEL: @shl_v64i8(
; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1		; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
; AVX512-NEXT: [[TMP3:%.*]] = shl <64 x i8> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = shl <64 x i8> [[TMP1]], [[TMP2]]
; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1		; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; XOP-LABEL: @shl_v64i8(		; XOP-LABEL: @shl_v64i8(
; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP3:%.*]] = shl <32 x i8> [[TMP1]], [[TMP2]]
; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP3]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
; XOP-NEXT: [[TMP5:%.*]] = shl <32 x i8> [[TMP1]], [[TMP3]]		; XOP-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: [[TMP6:%.*]] = shl <32 x i8> [[TMP2]], [[TMP4]]		; XOP-NEXT: [[TMP5:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1		; XOP-NEXT: [[TMP6:%.*]] = shl <32 x i8> [[TMP4]], [[TMP5]]
; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1		; XOP-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
; XOP-NEXT: ret void		; XOP-NEXT: ret void
;		;
%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1		%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1		%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1
%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1		%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1
%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1		%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1
%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1		%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1
▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell < %s \| FileCheck %s

	define void @wombat(i32* %ptr, i32* %ptr1) {			define void @wombat(i32* %ptr, i32* %ptr1) {
	; CHECK-LABEL: @wombat(			; CHECK-LABEL: @wombat(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[PTR:%.*]], i64 1			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[PTR:%.*]], i64 1
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 0			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 0
				; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3
				; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4
				; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5
				; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[TMP8]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[TMP8]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds i32, i32 [[PTR1:%.*]], i32 3
	; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP34:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 4
	; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 5
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], poison			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], poison
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> poison, <4 x i32> [[SHUFFLE1]]			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> poison, <4 x i32> [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> poison, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> poison, <4 x i32> zeroinitializer, <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[PTR1]], i32 6
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP27]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1			%tmp7 = getelementptr inbounds i32, i32* %ptr, i64 1
	%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0			%tmp8 = getelementptr inbounds i32, i32* %ptr, i64 0
	%tmp12 = load i32, i32* %tmp7, align 4			%tmp12 = load i32, i32* %tmp7, align 4
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/simple-loop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	define i32 @rollable(i32* noalias nocapture %in, i32* noalias nocapture %out, i64 %n) {			define i32 @rollable(i32* noalias nocapture %in, i32* noalias nocapture %out, i64 %n) {
	; CHECK-LABEL: @rollable(			; CHECK-LABEL: @rollable(
	; CHECK-NEXT: [[TMP1:%.]] = icmp eq i64 [[N:%.]], 0			; CHECK-NEXT: [[TMP1:%.]] = icmp eq i64 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[TMP1]], label [[DOT_CRIT_EDGE:%.]], label [[DOTLR_PH:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[DOT_CRIT_EDGE:%.]], label [[DOTLR_PH:%.]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: [[I_019:%.]] = phi i64 [ [[TMP10:%.]], [[DOTLR_PH]] ], [ 0, [[TMP0:%.*]] ]			; CHECK-NEXT: [[I_019:%.]] = phi i64 [ [[TMP10:%.]], [[DOTLR_PH]] ], [ 0, [[TMP0:%.*]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I_019]], 2			; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I_019]], 2
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	; CHECK-NEXT: [[TMP6:%.*]] = mul <4 x i32> [[TMP5]], <i32 7, i32 7, i32 7, i32 7>			; CHECK-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> [[TMP5]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[TMP6]], <i32 7, i32 14, i32 21, i32 28>			; CHECK-NEXT: [[TMP7:%.*]] = mul <4 x i32> [[TMP6]], <i32 7, i32 7, i32 7, i32 7>
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP7]], <i32 7, i32 14, i32 21, i32 28>
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <4 x i32>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP4]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4
	; CHECK-NEXT: [[TMP10]] = add i64 [[I_019]], 1			; CHECK-NEXT: [[TMP10]] = add i64 [[I_019]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[TMP10]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[TMP10]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]]
	; CHECK: ._crit_edge:			; CHECK: ._crit_edge:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%1 = icmp eq i64 %n, 0			%1 = icmp eq i64 %n, 0
	br i1 %1, label %._crit_edge, label %.lr.ph			br i1 %1, label %._crit_edge, label %.lr.ph
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/simplebb.ll

Show All 27 Lines	;
%arrayidx5 = getelementptr inbounds double, double* %c, i64 1		%arrayidx5 = getelementptr inbounds double, double* %c, i64 1
store double %mul5, double* %arrayidx5, align 8		store double %mul5, double* %arrayidx5, align 8
ret void		ret void
}		}

; Simple 3-pair chain with loads and stores, obfuscated with bitcasts		; Simple 3-pair chain with loads and stores, obfuscated with bitcasts
define void @test2(double* %a, double* %b, i8* %e) {		define void @test2(double* %a, double* %b, i8* %e) {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
		; CHECK-NEXT: [[C:%.]] = bitcast i8 [[E:%.]] to double
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[A:%.]] to <2 x double>		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[A:%.]] to <2 x double>
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[B:%.]] to <2 x double>		; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[B:%.]] to <2 x double>
; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8		; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[C:%.]] = bitcast i8 [[E:%.]] to double
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C]] to <2 x double>*		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%i0 = load double, double* %a, align 8		%i0 = load double, double* %a, align 8
%i1 = load double, double* %b, align 8		%i1 = load double, double* %b, align 8
%mul = fmul double %i0, %i1		%mul = fmul double %i0, %i1
%arrayidx3 = getelementptr inbounds double, double* %a, i64 1		%arrayidx3 = getelementptr inbounds double, double* %a, i64 1
%i3 = load double, double* %arrayidx3, align 8		%i3 = load double, double* %arrayidx3, align 8
%arrayidx4 = getelementptr inbounds double, double* %b, i64 1		%arrayidx4 = getelementptr inbounds double, double* %b, i64 1
%i4 = load double, double* %arrayidx4, align 8		%i4 = load double, double* %arrayidx4, align 8
%mul5 = fmul double %i3, %i4		%mul5 = fmul double %i3, %i4
%c = bitcast i8* %e to double*		%c = bitcast i8* %e to double*
store double %mul, double* %c, align 8		store double %mul, double* %c, align 8
%carrayidx5 = getelementptr inbounds i8, i8* %e, i64 8		%carrayidx5 = getelementptr inbounds i8, i8* %e, i64 8
%arrayidx5 = bitcast i8* %carrayidx5 to double*		%arrayidx5 = bitcast i8* %carrayidx5 to double*
store double %mul5, double* %arrayidx5, align 8		store double %mul5, double* %arrayidx5, align 8
ret void		ret void
}		}

; Don't vectorize volatile loads.		; Don't vectorize volatile loads.
define void @test_volatile_load(double* %a, double* %b, double* %c) {		define void @test_volatile_load(double* %a, double* %b, double* %c) {
; CHECK-LABEL: @test_volatile_load(		; CHECK-LABEL: @test_volatile_load(
; CHECK-NEXT: [[I0:%.]] = load volatile double, double [[A:%.*]], align 8		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1
; CHECK-NEXT: [[I1:%.]] = load volatile double, double [[B:%.*]], align 8		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 1
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A]], i64 1		; CHECK-NEXT: [[I0:%.]] = load volatile double, double [[A]], align 8
		; CHECK-NEXT: [[I1:%.]] = load volatile double, double [[B]], align 8
; CHECK-NEXT: [[I3:%.]] = load double, double [[ARRAYIDX3]], align 8		; CHECK-NEXT: [[I3:%.]] = load double, double [[ARRAYIDX3]], align 8
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B]], i64 1
; CHECK-NEXT: [[I4:%.]] = load double, double [[ARRAYIDX4]], align 8		; CHECK-NEXT: [[I4:%.]] = load double, double [[ARRAYIDX4]], align 8
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[I0]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[I0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[I3]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[I3]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[I1]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[I1]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[I4]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[I4]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @sitofp_8i64_8f64(		; AVX512-LABEL: @sitofp_8i64_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @sitofp_8i64_8f64(		; AVX256DQ-LABEL: @sitofp_8i64_8f64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; AVX256DQ-NEXT: [[TMP2:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x double>
; AVX256DQ-NEXT: [[TMP3:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x double>		; AVX256DQ-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256DQ-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP2]] to <4 x double>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; AVX256DQ-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256DQ-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP3]] to <4 x double>
; AVX256DQ-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256DQ-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%ld2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2), align 16		%ld2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2), align 16
%ld3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 3), align 8		%ld3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 3), align 8
%ld4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4), align 32		%ld4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4), align 32
Show All 33 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i32_4f64() #0 {		define void @sitofp_4i32_4f64() #0 {
; SSE-LABEL: @sitofp_4i32_4f64(		; SSE-LABEL: @sitofp_4i32_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i32_4f64(		; AVX-LABEL: @sitofp_4i32_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i32_8f64() #0 {		define void @sitofp_8i32_8f64() #0 {
; SSE-LABEL: @sitofp_8i32_8f64(		; SSE-LABEL: @sitofp_8i32_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i32> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i32> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i32_8f64(		; AVX256-LABEL: @sitofp_8i32_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i32_8f64(		; AVX512-LABEL: @sitofp_8i32_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i16_4f64() #0 {		define void @sitofp_4i16_4f64() #0 {
; SSE-LABEL: @sitofp_4i16_4f64(		; SSE-LABEL: @sitofp_4i16_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i16_4f64(		; AVX-LABEL: @sitofp_4i16_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i16_8f64() #0 {		define void @sitofp_8i16_8f64() #0 {
; SSE-LABEL: @sitofp_8i16_8f64(		; SSE-LABEL: @sitofp_8i16_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i16> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i16> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i16_8f64(		; AVX256-LABEL: @sitofp_8i16_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i16_8f64(		; AVX512-LABEL: @sitofp_8i16_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i8_4f64() #0 {		define void @sitofp_4i8_4f64() #0 {
; SSE-LABEL: @sitofp_4i8_4f64(		; SSE-LABEL: @sitofp_4i8_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i8_4f64(		; AVX-LABEL: @sitofp_4i8_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i8_8f64() #0 {		define void @sitofp_8i8_8f64() #0 {
; SSE-LABEL: @sitofp_8i8_8f64(		; SSE-LABEL: @sitofp_8i8_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i8> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i8> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i8> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i8_8f64(		; AVX256-LABEL: @sitofp_8i8_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i8_8f64(		; AVX512-LABEL: @sitofp_8i8_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i64_8f32() #0 {		define void @sitofp_8i64_8f32() #0 {
; SSE-LABEL: @sitofp_8i64_8f32(		; SSE-LABEL: @sitofp_8i64_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i64_8f32(		; AVX-LABEL: @sitofp_8i64_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i32_8f32() #0 {		define void @sitofp_8i32_8f32() #0 {
; SSE-LABEL: @sitofp_8i32_8f32(		; SSE-LABEL: @sitofp_8i32_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i32_8f32(		; AVX-LABEL: @sitofp_8i32_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i32_16f32() #0 {		define void @sitofp_16i32_16f32() #0 {
; SSE-LABEL: @sitofp_16i32_16f32(		; SSE-LABEL: @sitofp_16i32_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i32> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i32_16f32(		; AVX256-LABEL: @sitofp_16i32_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i32_16f32(		; AVX512-LABEL: @sitofp_16i32_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i32> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i32> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i16_8f32() #0 {		define void @sitofp_8i16_8f32() #0 {
; SSE-LABEL: @sitofp_8i16_8f32(		; SSE-LABEL: @sitofp_8i16_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i16_8f32(		; AVX-LABEL: @sitofp_8i16_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i16_16f32() #0 {		define void @sitofp_16i16_16f32() #0 {
; SSE-LABEL: @sitofp_16i16_16f32(		; SSE-LABEL: @sitofp_16i16_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i16> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i16> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i16> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i16_16f32(		; AVX256-LABEL: @sitofp_16i16_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i16_16f32(		; AVX512-LABEL: @sitofp_16i16_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i16> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i16> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i8_8f32() #0 {		define void @sitofp_8i8_8f32() #0 {
; SSE-LABEL: @sitofp_8i8_8f32(		; SSE-LABEL: @sitofp_8i8_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i8_8f32(		; AVX-LABEL: @sitofp_8i8_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i8_16f32() #0 {		define void @sitofp_16i8_16f32() #0 {
; SSE-LABEL: @sitofp_16i8_16f32(		; SSE-LABEL: @sitofp_16i8_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i8> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i8> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i8> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i8_16f32(		; AVX256-LABEL: @sitofp_16i8_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i8> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i8> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i8_16f32(		; AVX512-LABEL: @sitofp_16i8_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i8> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i8> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
; AVX512-LABEL: @sitofp_8i64_8f64(		; AVX512-LABEL: @sitofp_8i64_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
;		;
; AVX256DQ-LABEL: @sitofp_8i64_8f64(		; AVX256DQ-LABEL: @sitofp_8i64_8f64(
; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; AVX256DQ-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; AVX256DQ-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; AVX256DQ-NEXT: [[TMP2:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x double>
; AVX256DQ-NEXT: [[TMP3:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x double>		; AVX256DQ-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256DQ-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP2]] to <4 x double>		; AVX256DQ-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; AVX256DQ-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256DQ-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP3]] to <4 x double>
; AVX256DQ-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256DQ-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256DQ-NEXT: ret void		; AVX256DQ-NEXT: ret void
;		;
%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64		%ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8		%ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
%ld2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2), align 16		%ld2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2), align 16
%ld3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 3), align 8		%ld3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 3), align 8
%ld4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4), align 32		%ld4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4), align 32
Show All 33 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i32_4f64() #0 {		define void @sitofp_4i32_4f64() #0 {
; SSE-LABEL: @sitofp_4i32_4f64(		; SSE-LABEL: @sitofp_4i32_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i32_4f64(		; AVX-LABEL: @sitofp_4i32_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i32_8f64() #0 {		define void @sitofp_8i32_8f64() #0 {
; SSE-LABEL: @sitofp_8i32_8f64(		; SSE-LABEL: @sitofp_8i32_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i32> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i32> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i32> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i32_8f64(		; AVX256-LABEL: @sitofp_8i32_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i32_8f64(		; AVX512-LABEL: @sitofp_8i32_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i16_4f64() #0 {		define void @sitofp_4i16_4f64() #0 {
; SSE-LABEL: @sitofp_4i16_4f64(		; SSE-LABEL: @sitofp_4i16_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i16_4f64(		; AVX-LABEL: @sitofp_4i16_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i16_8f64() #0 {		define void @sitofp_8i16_8f64() #0 {
; SSE-LABEL: @sitofp_8i16_8f64(		; SSE-LABEL: @sitofp_8i16_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i16> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i16> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i16> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i16> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i16_8f64(		; AVX256-LABEL: @sitofp_8i16_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i16_8f64(		; AVX512-LABEL: @sitofp_8i16_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sitofp_4i8_4f64() #0 {		define void @sitofp_4i8_4f64() #0 {
; SSE-LABEL: @sitofp_4i8_4f64(		; SSE-LABEL: @sitofp_4i8_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_4i8_4f64(		; AVX-LABEL: @sitofp_4i8_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sitofp_8i8_8f64() #0 {		define void @sitofp_8i8_8f64() #0 {
; SSE-LABEL: @sitofp_8i8_8f64(		; SSE-LABEL: @sitofp_8i8_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: [[TMP5:%.*]] = sitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i8> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i8> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <2 x i8> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_8i8_8f64(		; AVX256-LABEL: @sitofp_8i8_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_8i8_8f64(		; AVX512-LABEL: @sitofp_8i8_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i64_8f32() #0 {		define void @sitofp_8i64_8f32() #0 {
; SSE-LABEL: @sitofp_8i64_8f32(		; SSE-LABEL: @sitofp_8i64_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i64> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i64> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i64_8f32(		; AVX-LABEL: @sitofp_8i64_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i64> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i32_8f32() #0 {		define void @sitofp_8i32_8f32() #0 {
; SSE-LABEL: @sitofp_8i32_8f32(		; SSE-LABEL: @sitofp_8i32_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i32_8f32(		; AVX-LABEL: @sitofp_8i32_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i32_16f32() #0 {		define void @sitofp_16i32_16f32() #0 {
; SSE-LABEL: @sitofp_16i32_16f32(		; SSE-LABEL: @sitofp_16i32_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i32> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i32_16f32(		; AVX256-LABEL: @sitofp_16i32_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i32> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i32_16f32(		; AVX512-LABEL: @sitofp_16i32_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i32> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i32> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i16_8f32() #0 {		define void @sitofp_8i16_8f32() #0 {
; SSE-LABEL: @sitofp_8i16_8f32(		; SSE-LABEL: @sitofp_8i16_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i16_8f32(		; AVX-LABEL: @sitofp_8i16_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i16_16f32() #0 {		define void @sitofp_16i16_16f32() #0 {
; SSE-LABEL: @sitofp_16i16_16f32(		; SSE-LABEL: @sitofp_16i16_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i16> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i16> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i16> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i16> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i16_16f32(		; AVX256-LABEL: @sitofp_16i16_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i16_16f32(		; AVX512-LABEL: @sitofp_16i16_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i16> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i16> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i8_8f32() #0 {		define void @sitofp_8i8_8f32() #0 {
; SSE-LABEL: @sitofp_8i8_8f32(		; SSE-LABEL: @sitofp_8i8_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sitofp_8i8_8f32(		; AVX-LABEL: @sitofp_8i8_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i8_16f32() #0 {		define void @sitofp_16i8_16f32() #0 {
; SSE-LABEL: @sitofp_16i8_16f32(		; SSE-LABEL: @sitofp_16i8_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = sitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = sitofp <4 x i8> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i8> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = sitofp <4 x i8> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = sitofp <4 x i8> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i8_16f32(		; AVX256-LABEL: @sitofp_16i8_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i8> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i8> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sitofp_16i8_16f32(		; AVX512-LABEL: @sitofp_16i8_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i8> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = sitofp <16 x i8> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/split-load8_2-unord.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake-avx512 \| FileCheck %s

	%struct.S = type { [8 x i32], [8 x i32], [16 x i32] }			%struct.S = type { [8 x i32], [8 x i32], [16 x i32] }

	define dso_local void @_Z4testP1S(%struct.S* %p) local_unnamed_addr {			define dso_local void @_Z4testP1S(%struct.S* %p) local_unnamed_addr {
	; CHECK-LABEL: @_Z4testP1S(			; CHECK-LABEL: @_Z4testP1S(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 1, i64 0			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 1, i64 0
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 15
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 1
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 7
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 2			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 2
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 6
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2
	; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 3			; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 3
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 4
	; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX25:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 4			; CHECK-NEXT: [[ARRAYIDX25:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 4
	; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 12
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 5			; CHECK-NEXT: [[ARRAYIDX32:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 5
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 13
	; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 6			; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 6
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 14
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6
	; CHECK-NEXT: [[ARRAYIDX46:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 7			; CHECK-NEXT: [[ARRAYIDX46:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 1, i64 7
				; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 15
				; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 7
				; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 6
				; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 4
				; CHECK-NEXT: [[ARRAYIDX27:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 12
				; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 13
				; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 14
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARRAYIDX]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 5			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 2, i64 5
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32* [[ARRAYIDX1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32* [[ARRAYIDX1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32* [[ARRAYIDX6]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32* [[ARRAYIDX6]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32* [[ARRAYIDX13]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32* [[ARRAYIDX13]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32* [[ARRAYIDX20]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32* [[ARRAYIDX20]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32* [[ARRAYIDX27]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32* [[ARRAYIDX27]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32* [[ARRAYIDX34]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32* [[ARRAYIDX34]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32* [[ARRAYIDX41]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32* [[ARRAYIDX41]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32* [[ARRAYIDX48]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32* [[ARRAYIDX48]], i32 7
	; CHECK-NEXT: [[TMP10:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)			; CHECK-NEXT: [[TMP10:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
	; CHECK-NEXT: [[TMP11:%.*]] = add nsw <8 x i32> [[TMP10]], [[TMP1]]			; CHECK-NEXT: [[TMP11:%.*]] = add nsw <8 x i32> [[TMP10]], [[TMP1]]
	; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[TMP11]], <8 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <8 x i32> [[TMP11]], <8 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 1, i64 0			%arrayidx = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 1, i64 0
	%i = load i32, i32* %arrayidx, align 4			%i = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 2, i64 15			%arrayidx1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 2, i64 15
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[G13:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P1]], i32 0, i64 7			; CHECK-NEXT: [[G13:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P1]], i32 0, i64 7
	; CHECK-NEXT: [[G20:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 12			; CHECK-NEXT: [[G20:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 12
	; CHECK-NEXT: [[G21:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 13			; CHECK-NEXT: [[G21:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 13
	; CHECK-NEXT: [[G22:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 14			; CHECK-NEXT: [[G22:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 14
	; CHECK-NEXT: [[G23:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 15			; CHECK-NEXT: [[G23:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 15
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[G10]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6
				; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[G10]] to <4 x i32>*
				; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G20]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G20]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP4]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP4]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 3, i32 1, i32 2, i32 0, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <8 x i32> <i32 3, i32 1, i32 2, i32 0, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[TMP7]], <8 x i32>* [[TMP8]], align 4			; CHECK-NEXT: store <8 x i32> [[TMP7]], <8 x i32>* [[TMP8]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[G11:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P1]], i32 0, i64 5			; CHECK-NEXT: [[G11:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P1]], i32 0, i64 5
	; CHECK-NEXT: [[G12:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 6			; CHECK-NEXT: [[G12:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 6
	; CHECK-NEXT: [[G13:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 7			; CHECK-NEXT: [[G13:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P2]], i32 0, i64 7
	; CHECK-NEXT: [[G20:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P3]], i32 0, i64 12			; CHECK-NEXT: [[G20:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P3]], i32 0, i64 12
	; CHECK-NEXT: [[G21:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P3]], i32 0, i64 13			; CHECK-NEXT: [[G21:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P3]], i32 0, i64 13
	; CHECK-NEXT: [[G22:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P4]], i32 0, i64 14			; CHECK-NEXT: [[G22:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P4]], i32 0, i64 14
	; CHECK-NEXT: [[G23:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P4]], i32 0, i64 15			; CHECK-NEXT: [[G23:%.]] = getelementptr inbounds [16 x i32], [16 x i32] [[P4]], i32 0, i64 15
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 0, i64 0			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P:%.*]], i64 0, i32 0, i64 0
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[G10]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 1
	; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2			; CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G12]] to <2 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3			; CHECK-NEXT: [[ARRAYIDX23:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 3
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 4
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[G20]] to <2 x i32>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
	; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5			; CHECK-NEXT: [[ARRAYIDX37:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 5
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 6
				; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[G10]] to <2 x i32>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G12]] to <2 x i32>*
				; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
				; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[G20]] to <2 x i32>*
				; CHECK-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> [[TMP4]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[G22]] to <2 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[G22]] to <2 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> [[TMP6]], align 4
	; CHECK-NEXT: [[ARRAYIDX51:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 0, i32 0, i64 7
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <8 x i32> [[TMP12]], <8 x i32> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <8 x i32> [[TMP12]], <8 x i32> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[ARRAYIDX2]] to <8 x i32>*
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sqrt.ll

Show All 31 Lines	;
store double %sqrt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8		store double %sqrt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 8
store double %sqrt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %sqrt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @sqrt_4f64() #0 {		define void @sqrt_4f64() #0 {
; SSE-LABEL: @sqrt_4f64(		; SSE-LABEL: @sqrt_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP1]])		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8
; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 8		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP3]])
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 8
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sqrt_4f64(		; AVX-LABEL: @sqrt_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8		; AVX-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP1]])
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 8
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %sqrt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8		store double %sqrt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 8
store double %sqrt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %sqrt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @sqrt_8f64() #0 {		define void @sqrt_8f64() #0 {
; SSE-LABEL: @sqrt_8f64(		; SSE-LABEL: @sqrt_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast ([8 x double]* @src64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP1]])
; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP1]])		; SSE-NEXT: [[TMP4:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP3]])
; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP2]])		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP3]])		; SSE-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP4]])		; SSE-NEXT: [[TMP6:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP5]])
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP7]])
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sqrt_8f64(		; AVX256-LABEL: @sqrt_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP1]])
; AVX256-NEXT: [[TMP3:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP1]])		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4
; AVX256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP2]])		; AVX256-NEXT: [[TMP3:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <4 x double> @llvm.sqrt.v4f64(<4 x double> [[TMP3]])
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sqrt_8f64(		; AVX512-LABEL: @sqrt_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.sqrt.v8f64(<8 x double> [[TMP1]])		; AVX512-NEXT: [[TMP2:%.*]] = call <8 x double> @llvm.sqrt.v8f64(<8 x double> [[TMP1]])
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	;
store float %sqrt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4		store float %sqrt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 4
store float %sqrt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %sqrt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sqrt_8f32() #0 {		define void @sqrt_8f32() #0 {
; SSE-LABEL: @sqrt_8f32(		; SSE-LABEL: @sqrt_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP1]])
; SSE-NEXT: [[TMP3:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP1]])		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP2]])		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP3]])
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @sqrt_8f32(		; AVX-LABEL: @sqrt_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP1]])		; AVX-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP1]])
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %sqrt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4		store float %sqrt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 4
store float %sqrt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %sqrt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sqrt_16f32() #0 {		define void @sqrt_16f32() #0 {
; SSE-LABEL: @sqrt_16f32(		; SSE-LABEL: @sqrt_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([16 x float]* @src32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP1]])
; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4
; SSE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP1]])		; SSE-NEXT: [[TMP4:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP3]])
; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP2]])		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4
; SSE-NEXT: [[TMP7:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP3]])		; SSE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP4]])		; SSE-NEXT: [[TMP6:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP5]])
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 4		; SSE-NEXT: [[TMP7:%.]] = load <4 x float>, <4 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 4		; SSE-NEXT: [[TMP8:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP7]])
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sqrt_16f32(		; AVX256-LABEL: @sqrt_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([16 x float]* @src32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP1]])
; AVX256-NEXT: [[TMP3:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP1]])		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4
; AVX256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP2]])		; AVX256-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 4		; AVX256-NEXT: [[TMP4:%.*]] = call <8 x float> @llvm.sqrt.v8f32(<8 x float> [[TMP3]])
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 4
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @sqrt_16f32(		; AVX512-LABEL: @sqrt_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4		; AVX512-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> bitcast ([16 x float]* @src32 to <16 x float>*), align 4
; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.sqrt.v16f32(<16 x float> [[TMP1]])		; AVX512-NEXT: [[TMP2:%.*]] = call <16 x float> @llvm.sqrt.v16f32(<16 x float> [[TMP1]])
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 4
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/store-jumbled.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s			; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-vectorizer \| FileCheck %s



	define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {			define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {
	; CHECK-LABEL: @jumbled-load(			; CHECK-LABEL: @jumbled-load(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0			; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0
	; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1
	; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2
	; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IN_ADDR]] to <4 x i32>*
				; CHECK-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*
				; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 0, i32 2>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/stores-non-ordered.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -mtriple=x86_64-unknown -slp-vectorizer -slp-max-store-lookup=2 -slp-min-reg-size=64 -slp-threshold=-1000 \| FileCheck %s			; RUN: opt < %s -S -mtriple=x86_64-unknown -slp-vectorizer -slp-max-store-lookup=2 -slp-min-reg-size=64 -slp-threshold=-1000 \| FileCheck %s

	define i32 @non-ordered-stores(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {			define i32 @non-ordered-stores(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {
	; CHECK-LABEL: @non-ordered-stores(			; CHECK-LABEL: @non-ordered-stores(
	; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0			; CHECK-NEXT: [[IN_ADDR:%.]] = getelementptr inbounds i32, i32 [[IN:%.*]], i64 0
	; CHECK-NEXT: [[LOAD_1:%.]] = load i32, i32 [[IN_ADDR]], align 4
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 1
	; CHECK-NEXT: [[LOAD_2:%.]] = load i32, i32 [[GEP_1]], align 4
	; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 2
	; CHECK-NEXT: [[LOAD_3:%.]] = load i32, i32 [[GEP_2]], align 4
	; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_3:%.]] = getelementptr inbounds i32, i32 [[IN_ADDR]], i64 3
	; CHECK-NEXT: [[LOAD_4:%.]] = load i32, i32 [[GEP_3]], align 4
	; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0			; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0
	; CHECK-NEXT: [[LOAD_5:%.]] = load i32, i32 [[INN_ADDR]], align 4
	; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1
	; CHECK-NEXT: [[LOAD_6:%.]] = load i32, i32 [[GEP_4]], align 4
	; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2
	; CHECK-NEXT: [[LOAD_7:%.]] = load i32, i32 [[GEP_5]], align 4
	; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3
	; CHECK-NEXT: [[LOAD_8:%.]] = load i32, i32 [[GEP_6]], align 4			; CHECK-NEXT: [[LOAD_1:%.]] = load i32, i32 [[IN_ADDR]], align 4
				; CHECK-NEXT: [[LOAD_3:%.]] = load i32, i32 [[GEP_2]], align 4
				; CHECK-NEXT: [[LOAD_5:%.]] = load i32, i32 [[INN_ADDR]], align 4
				; CHECK-NEXT: [[LOAD_7:%.]] = load i32, i32 [[GEP_5]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[LOAD_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[LOAD_3]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_5]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[LOAD_7]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[LOAD_7]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[TMP2]], [[TMP4]]
				; CHECK-NEXT: [[LOAD_2:%.]] = load i32, i32 [[GEP_1]], align 4
				; CHECK-NEXT: [[LOAD_4:%.]] = load i32, i32 [[GEP_3]], align 4
				; CHECK-NEXT: [[LOAD_6:%.]] = load i32, i32 [[GEP_4]], align 4
				; CHECK-NEXT: [[LOAD_8:%.]] = load i32, i32 [[GEP_6]], align 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_2]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_2]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[LOAD_4]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[LOAD_4]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_6]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[LOAD_8]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[LOAD_8]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i32> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i32> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: br label [[BLOCK1:%.*]]			; CHECK-NEXT: br label [[BLOCK1:%.*]]
	; CHECK: block1:			; CHECK: block1:
	; CHECK-NEXT: [[GEP_X:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 5			; CHECK-NEXT: [[GEP_X:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 5
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/stores_vectorize.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 7			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 7
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 9			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 9
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 6			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 6
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 10			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 10
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3
				; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 11
				; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[P3]] to <4 x i64>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 11
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[ARRAYIDX1]] to <4 x i64>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[ARRAYIDX1]] to <4 x i64>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[ARRAYIDX14]] to <4 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[ARRAYIDX14]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP5]], align 8			; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* %p3, align 8			%0 = load i64, i64* %p3, align 8
	%arrayidx1 = getelementptr inbounds i64, i64* %p3, i64 8			%arrayidx1 = getelementptr inbounds i64, i64* %p3, i64 8
	Show All 35 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX1]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX1]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[P4:%.*]], i64 3			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[P4:%.*]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i64> [[TMP3]], <i64 5, i64 5, i64 5, i64 5>
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[P3]] to <4 x i64>*
				; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
				; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i64> [[TMP3]], <i64 5, i64 5, i64 5, i64 5>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[P3]] to <4 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[TMP5]], align 8			; CHECK-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	store i64 5, i64* %p3, align 8			store i64 5, i64* %p3, align 8
	%idx.ext = sext i32 %p2 to i64			%idx.ext = sext i32 %p2 to i64
	%add.ptr = getelementptr inbounds float, float* %p1, i64 %idx.ext			%add.ptr = getelementptr inbounds float, float* %p1, i64 %idx.ext
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX1]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX1]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[P4:%.*]], i64 3			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[P4:%.*]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
	; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4			; CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 1
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 2
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i64> [[TMP3]], <i64 5, i64 5, i64 5, i64 5>
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 5
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
	; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8			; CHECK-NEXT: store i64 5, i64* [[ARRAYIDX9]], align 8
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[P3]] to <4 x i64>*
				; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
				; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i64> [[TMP3]], <i64 5, i64 5, i64 5, i64 5>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[P3]] to <4 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[TMP5]], align 8			; CHECK-NEXT: store <4 x i64> [[TMP4]], <4 x i64>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	store i64 5, i64* %p3, align 8			store i64 5, i64* %p3, align 8
	%idx.ext = sext i32 %p2 to i64			%idx.ext = sext i32 %p2 to i64
	%add.ptr = getelementptr inbounds float, float* %p1, i64 %idx.ext			%add.ptr = getelementptr inbounds float, float* %p1, i64 %idx.ext
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/tiny-tree.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	define void @tiny_tree_fully_vectorizable(double* noalias nocapture %dst, double* noalias nocapture readonly %src, i64 %count) #0 {			define void @tiny_tree_fully_vectorizable(double* noalias nocapture %dst, double* noalias nocapture readonly %src, i64 %count) #0 {
	; CHECK-LABEL: @tiny_tree_fully_vectorizable(			; CHECK-LABEL: @tiny_tree_fully_vectorizable(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP12:%.]] = icmp eq i64 [[COUNT:%.]], 0			; CHECK-NEXT: [[CMP12:%.]] = icmp eq i64 [[COUNT:%.]], 0
	; CHECK-NEXT: br i1 [[CMP12]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; CHECK-NEXT: br i1 [[CMP12]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_015:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[I_015:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[DST_ADDR_014:%.]] = phi double [ [[ADD_PTR4:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[DST_ADDR_014:%.]] = phi double [ [[ADD_PTR4:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[SRC_ADDR_013:%.]] = phi double [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[SRC_ADDR_013:%.]] = phi double [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[SRC_ADDR_013]], i64 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[SRC_ADDR_013]], i64 1
				; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[DST_ADDR_014]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[SRC_ADDR_013]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[SRC_ADDR_013]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[DST_ADDR_014]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[DST_ADDR_014]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[DST_ADDR_014]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[TMP2]], align 8			; CHECK-NEXT: store <2 x double> [[TMP1]], <2 x double>* [[TMP2]], align 8
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds double, double* [[SRC_ADDR_013]], i64 [[I_015]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds double, double* [[SRC_ADDR_013]], i64 [[I_015]]
	; CHECK-NEXT: [[ADD_PTR4]] = getelementptr inbounds double, double* [[DST_ADDR_014]], i64 [[I_015]]			; CHECK-NEXT: [[ADD_PTR4]] = getelementptr inbounds double, double* [[DST_ADDR_014]], i64 [[I_015]]
	; CHECK-NEXT: [[INC]] = add i64 [[I_015]], 1			; CHECK-NEXT: [[INC]] = add i64 [[I_015]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	Show All 32 Lines
	; CHECK-NEXT: [[I_023:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[I_023:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[DST_ADDR_022:%.]] = phi float [ [[ADD_PTR8:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[DST_ADDR_022:%.]] = phi float [ [[ADD_PTR8:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 1
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3
				; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC_ADDR_021]] to <4 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC_ADDR_021]] to <4 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[DST_ADDR_022]] to <4 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[DST_ADDR_022]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4			; CHECK-NEXT: store <4 x float> [[TMP1]], <4 x float>* [[TMP2]], align 4
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]
	; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]
	; CHECK-NEXT: [[INC]] = add i64 [[I_023]], 1			; CHECK-NEXT: [[INC]] = add i64 [[I_023]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @tiny_tree_not_fully_vectorizable2(			; CHECK-LABEL: @tiny_tree_not_fully_vectorizable2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP20:%.]] = icmp eq i64 [[COUNT:%.]], 0			; CHECK-NEXT: [[CMP20:%.]] = icmp eq i64 [[COUNT:%.]], 0
	; CHECK-NEXT: br i1 [[CMP20]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; CHECK-NEXT: br i1 [[CMP20]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_023:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[I_023:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[DST_ADDR_022:%.]] = phi float [ [[ADD_PTR8:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[DST_ADDR_022:%.]] = phi float [ [[ADD_PTR8:%.]], [[FOR_BODY]] ], [ [[DST:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]			; CHECK-NEXT: [[SRC_ADDR_021:%.]] = phi float [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[SRC:%.]], [[ENTRY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_021]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 4			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 4
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 1
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 2
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 2
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[SRC_ADDR_021]], i64 3
				; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3
				; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC_ADDR_021]], align 4
				; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX4]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[DST_ADDR_022]], i64 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[DST_ADDR_022]] to <4 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[DST_ADDR_022]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4			; CHECK-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]
	; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]			; CHECK-NEXT: [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll

Show All 33 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @uitofp_4i64_4f64() #0 {		define void @uitofp_4i64_4f64() #0 {
; SSE-LABEL: @uitofp_4i64_4f64(		; SSE-LABEL: @uitofp_4i64_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2) to <2 x i64>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i64> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2) to <2 x i64>*), align 16
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i64> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_4i64_4f64(		; AVX-LABEL: @uitofp_4i64_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @uitofp_8i64_8f64() #0 {		define void @uitofp_8i64_8f64() #0 {
; SSE-LABEL: @uitofp_8i64_8f64(		; SSE-LABEL: @uitofp_8i64_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2) to <2 x i64>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <2 x i64>*), align 32		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 6) to <2 x i64>*), align 16		; SSE-NEXT: [[TMP3:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 2) to <2 x i64>*), align 16
; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i64> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i64> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <2 x i64> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <2 x i64>*), align 32
; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i64> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i64> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 6) to <2 x i64>*), align 16
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i64> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_8i64_8f64(		; AVX256-LABEL: @uitofp_8i64_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i64> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i64> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_8i64_8f64(		; AVX512-LABEL: @uitofp_8i64_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i64> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i64> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @uitofp_4i32_4f64() #0 {		define void @uitofp_4i32_4f64() #0 {
; SSE-LABEL: @uitofp_4i32_4f64(		; SSE-LABEL: @uitofp_4i32_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_4i32_4f64(		; AVX-LABEL: @uitofp_4i32_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @uitofp_8i32_8f64() #0 {		define void @uitofp_8i32_8f64() #0 {
; SSE-LABEL: @uitofp_8i32_8f64(		; SSE-LABEL: @uitofp_8i32_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([16 x i32]* @src32 to <2 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i32> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 2) to <2 x i32>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i32> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i32> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i32> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <2 x i32> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <2 x i32>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i32> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i32> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 6) to <2 x i32>*), align 8
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i32> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_8i32_8f64(		; AVX256-LABEL: @uitofp_8i32_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_8i32_8f64(		; AVX512-LABEL: @uitofp_8i32_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @uitofp_4i16_4f64() #0 {		define void @uitofp_4i16_4f64() #0 {
; SSE-LABEL: @uitofp_4i16_4f64(		; SSE-LABEL: @uitofp_4i16_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_4i16_4f64(		; AVX-LABEL: @uitofp_4i16_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @uitofp_8i16_8f64() #0 {		define void @uitofp_8i16_8f64() #0 {
; SSE-LABEL: @uitofp_8i16_8f64(		; SSE-LABEL: @uitofp_8i16_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i16>, <2 x i16> bitcast ([32 x i16]* @src16 to <2 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i16> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2) to <2 x i16>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i16> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i16> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <2 x i16> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <2 x i16>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i16> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i16> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i16>, <2 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6) to <2 x i16>*), align 4
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i16> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_8i16_8f64(		; AVX256-LABEL: @uitofp_8i16_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i16> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i16> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_8i16_8f64(		; AVX512-LABEL: @uitofp_8i16_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
Show All 39 Lines	;
store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64		store double %cvt0, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 0), align 64
store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8		store double %cvt1, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 1), align 8
ret void		ret void
}		}

define void @uitofp_4i8_4f64() #0 {		define void @uitofp_4i8_4f64() #0 {
; SSE-LABEL: @uitofp_4i8_4f64(		; SSE-LABEL: @uitofp_4i8_4f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP3]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_4i8_4f64(		; AVX-LABEL: @uitofp_4i8_4f64(
; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 11 Lines	;
store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16		store double %cvt2, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2), align 16
store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8		store double %cvt3, double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 3), align 8
ret void		ret void
}		}

define void @uitofp_8i8_8f64() #0 {		define void @uitofp_8i8_8f64() #0 {
; SSE-LABEL: @uitofp_8i8_8f64(		; SSE-LABEL: @uitofp_8i8_8f64(
; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> bitcast ([64 x i8]* @src8 to <2 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP2:%.*]] = uitofp <2 x i8> [[TMP1]] to <2 x double>
; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4		; SSE-NEXT: store <2 x double> [[TMP2]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2		; SSE-NEXT: [[TMP3:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 2) to <2 x i8>*), align 2
; SSE-NEXT: [[TMP5:%.*]] = uitofp <2 x i8> [[TMP1]] to <2 x double>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <2 x i8> [[TMP3]] to <2 x double>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i8> [[TMP2]] to <2 x double>		; SSE-NEXT: store <2 x double> [[TMP4]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <2 x i8> [[TMP3]] to <2 x double>		; SSE-NEXT: [[TMP5:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <2 x i8>*), align 4
; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[TMP4]] to <2 x double>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <2 x i8> [[TMP5]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP5]], <2 x double>* bitcast ([8 x double]* @dst64 to <2 x double>*), align 64		; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32
; SSE-NEXT: store <2 x double> [[TMP6]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 2) to <2 x double>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 6) to <2 x i8>*), align 2
; SSE-NEXT: store <2 x double> [[TMP7]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <2 x double>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <2 x i8> [[TMP7]] to <2 x double>
; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16		; SSE-NEXT: store <2 x double> [[TMP8]], <2 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 6) to <2 x double>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_8i8_8f64(		; AVX256-LABEL: @uitofp_8i8_8f64(
; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x double>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x double>		; AVX256-NEXT: store <4 x double> [[TMP2]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i8> [[TMP2]] to <4 x double>		; AVX256-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; AVX256-NEXT: store <4 x double> [[TMP3]], <4 x double>* bitcast ([8 x double]* @dst64 to <4 x double>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <4 x i8> [[TMP3]] to <4 x double>
; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32		; AVX256-NEXT: store <4 x double> [[TMP4]], <4 x double>* bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @dst64, i32 0, i64 4) to <4 x double>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_8i8_8f64(		; AVX512-LABEL: @uitofp_8i8_8f64(
; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x double>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x double>
; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64		; AVX512-NEXT: store <8 x double> [[TMP2]], <8 x double>* bitcast ([8 x double]* @dst64 to <8 x double>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @uitofp_8i64_8f32() #0 {		define void @uitofp_8i64_8f32() #0 {
; SSE-LABEL: @uitofp_8i64_8f32(		; SSE-LABEL: @uitofp_8i64_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @src64 to <4 x i64>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i64> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i64> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 4) to <4 x i64>*), align 32
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i64> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_8i64_8f32(		; AVX-LABEL: @uitofp_8i64_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @src64 to <8 x i64>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i64> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i64> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @uitofp_8i32_8f32() #0 {		define void @uitofp_8i32_8f32() #0 {
; SSE-LABEL: @uitofp_8i32_8f32(		; SSE-LABEL: @uitofp_8i32_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_8i32_8f32(		; AVX-LABEL: @uitofp_8i32_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @uitofp_16i32_16f32() #0 {		define void @uitofp_16i32_16f32() #0 {
; SSE-LABEL: @uitofp_16i32_16f32(		; SSE-LABEL: @uitofp_16i32_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @src32 to <4 x i32>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16		; SSE-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 16
; SSE-NEXT: [[TMP5:%.*]] = uitofp <4 x i32> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i32> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <4 x i32> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <4 x i32>*), align 32
; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i32> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i32> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 12) to <4 x i32>*), align 16
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i32> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_16i32_16f32(		; AVX256-LABEL: @uitofp_16i32_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @src32 to <8 x i32>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <8 x i32> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i32> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @src32, i32 0, i64 8) to <8 x i32>*), align 32
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i32> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_16i32_16f32(		; AVX512-LABEL: @uitofp_16i32_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @src32 to <16 x i32>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i32> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i32> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @uitofp_8i16_8f32() #0 {		define void @uitofp_8i16_8f32() #0 {
; SSE-LABEL: @uitofp_8i16_8f32(		; SSE-LABEL: @uitofp_8i16_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_8i16_8f32(		; AVX-LABEL: @uitofp_8i16_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @uitofp_16i16_16f32() #0 {		define void @uitofp_16i16_16f32() #0 {
; SSE-LABEL: @uitofp_16i16_16f32(		; SSE-LABEL: @uitofp_16i16_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i16>, <4 x i16> bitcast ([32 x i16]* @src16 to <4 x i16>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8		; SSE-NEXT: [[TMP3:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4) to <4 x i16>*), align 8
; SSE-NEXT: [[TMP5:%.*]] = uitofp <4 x i16> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i16> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i16> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <4 x i16> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <4 x i16>*), align 16
; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i16> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i16> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i16>, <4 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12) to <4 x i16>*), align 8
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i16> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_16i16_16f32(		; AVX256-LABEL: @uitofp_16i16_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i16> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i16> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_16i16_16f32(		; AVX512-LABEL: @uitofp_16i16_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @src16 to <16 x i16>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i16> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i16> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @uitofp_8i8_8f32() #0 {		define void @uitofp_8i8_8f32() #0 {
; SSE-LABEL: @uitofp_8i8_8f32(		; SSE-LABEL: @uitofp_8i8_8f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP3]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @uitofp_8i8_8f32(		; AVX-LABEL: @uitofp_8i8_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void		; AVX-NEXT: ret void
Show All 23 Lines	;
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @uitofp_16i8_16f32() #0 {		define void @uitofp_16i8_16f32() #0 {
; SSE-LABEL: @uitofp_16i8_16f32(		; SSE-LABEL: @uitofp_16i8_16f32(
; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64		; SSE-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> bitcast ([64 x i8]* @src8 to <4 x i8>*), align 64
; SSE-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP2:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x float>
; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8		; SSE-NEXT: store <4 x float> [[TMP2]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64
; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4		; SSE-NEXT: [[TMP3:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 4) to <4 x i8>*), align 4
; SSE-NEXT: [[TMP5:%.*]] = uitofp <4 x i8> [[TMP1]] to <4 x float>		; SSE-NEXT: [[TMP4:%.*]] = uitofp <4 x i8> [[TMP3]] to <4 x float>
; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i8> [[TMP2]] to <4 x float>		; SSE-NEXT: store <4 x float> [[TMP4]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16
; SSE-NEXT: [[TMP7:%.*]] = uitofp <4 x i8> [[TMP3]] to <4 x float>		; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <4 x i8>*), align 8
; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i8> [[TMP4]] to <4 x float>		; SSE-NEXT: [[TMP6:%.*]] = uitofp <4 x i8> [[TMP5]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP5]], <4 x float>* bitcast ([16 x float]* @dst32 to <4 x float>*), align 64		; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32
; SSE-NEXT: store <4 x float> [[TMP6]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4) to <4 x float>*), align 16		; SSE-NEXT: [[TMP7:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 12) to <4 x i8>*), align 4
; SSE-NEXT: store <4 x float> [[TMP7]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <4 x float>*), align 32		; SSE-NEXT: [[TMP8:%.*]] = uitofp <4 x i8> [[TMP7]] to <4 x float>
; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16		; SSE-NEXT: store <4 x float> [[TMP8]], <4 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12) to <4 x float>*), align 16
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_16i8_16f32(		; AVX256-LABEL: @uitofp_16i8_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> bitcast ([64 x i8]* @src8 to <8 x i8>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8		; AVX256-NEXT: [[TMP2:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <8 x i8> [[TMP1]] to <8 x float>		; AVX256-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i8> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @src8, i32 0, i64 8) to <8 x i8>*), align 8
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i8> [[TMP3]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32		; AVX256-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; AVX256-NEXT: ret void		; AVX256-NEXT: ret void
;		;
; AVX512-LABEL: @uitofp_16i8_16f32(		; AVX512-LABEL: @uitofp_16i8_16f32(
; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64		; AVX512-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @src8 to <16 x i8>*), align 64
; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i8> [[TMP1]] to <16 x float>		; AVX512-NEXT: [[TMP2:%.*]] = uitofp <16 x i8> [[TMP1]] to <16 x float>
; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64		; AVX512-NEXT: store <16 x float> [[TMP2]], <16 x float>* bitcast ([16 x float]* @dst32 to <16 x float>*), align 64
; AVX512-NEXT: ret void		; AVX512-NEXT: ret void
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-alt-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo(i8* %c, float* %d) {			define void @foo(i8* %c, float* %d) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[C:%.*]], i64 4			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[C:%.*]], i64 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[C]], i64 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[C]], i64 1
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i8, i8 [[C]], i64 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i8, i8 [[C]], i64 2
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i8, i8 [[C]], i64 3			; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i8, i8 [[C]], i64 3
				; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 -1
				; CHECK-NEXT: [[ADD_PTR37:%.]] = getelementptr inbounds float, float [[D]], i64 -2
				; CHECK-NEXT: [[ADD_PTR45:%.]] = getelementptr inbounds float, float [[D]], i64 -3
				; CHECK-NEXT: [[ADD_PTR53:%.]] = getelementptr inbounds float, float [[D]], i64 -4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX4]] to <4 x i8>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ARRAYIDX4]] to <4 x i8>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i8>, <4 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[TMP1]] to <4 x i32>
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 1, i32 2, i32 7, i32 0>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 1, i32 2, i32 7, i32 0>
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[D:%.*]], i64 -1
	; CHECK-NEXT: [[ADD_PTR37:%.]] = getelementptr inbounds float, float [[D]], i64 -2
	; CHECK-NEXT: [[ADD_PTR45:%.]] = getelementptr inbounds float, float [[D]], i64 -3
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> poison, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP6]] to <4 x float>			; CHECK-NEXT: [[TMP7:%.*]] = sitofp <4 x i32> [[TMP6]] to <4 x float>
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <4 x float> [[TMP7]], poison			; CHECK-NEXT: [[TMP8:%.*]] = fdiv <4 x float> [[TMP7]], poison
	; CHECK-NEXT: [[ADD_PTR53:%.]] = getelementptr inbounds float, float [[D]], i64 -4
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ADD_PTR53]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[ADD_PTR53]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP9]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx1 = getelementptr inbounds i8, i8* %c, i64 4			%arrayidx1 = getelementptr inbounds i8, i8* %c, i64 4
	%0 = load i8, i8* %arrayidx1, align 1			%0 = load i8, i8* %arrayidx1, align 1
	%conv2 = zext i8 %0 to i32			%conv2 = zext i8 %0 to i32
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reordered-list.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S --slp-vectorizer -mtriple=x86_64-unknown %s \| FileCheck %s			; RUN: opt -S --slp-vectorizer -mtriple=x86_64-unknown %s \| FileCheck %s

	define void @test(double* %isec) {			define void @test(double* %isec) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[ISEC:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[ISEC:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds double, double [[ISEC]], i64 0			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds double, double [[ISEC]], i64 0
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX10]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[ISEC]], i64 3			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[ISEC]], i64 3
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[ISEC]], i64 2			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[ISEC]], i64 2
				; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX10]] to <2 x double>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> [[TMP5]], <2 x i32> <i32 1, i32 2>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> [[TMP5]], <2 x i32> <i32 1, i32 2>
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[ARRAYIDX10]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[ARRAYIDX10]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	Show All 11 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double			; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x double> [[TMP11]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x double> [[TMP11]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = fcmp ogt <4 x double> [[TMP13]], [[TMP4]]			; CHECK-NEXT: [[TMP14:%.*]] = fcmp ogt <4 x double> [[TMP13]], [[TMP9]]
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP16:%.*]] = fptrunc <4 x double> [[TMP15]] to <4 x float>			; CHECK-NEXT: [[TMP16:%.*]] = fptrunc <4 x double> [[TMP15]] to <4 x float>
	; CHECK-NEXT: [[TMP17:%.*]] = select <4 x i1> [[TMP14]], <4 x float> [[TMP2]], <4 x float> [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = select <4 x i1> [[TMP14]], <4 x float> [[TMP2]], <4 x float> [[TMP16]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP18]] = phi <4 x float> [ [[TMP17]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]			; CHECK-NEXT: [[TMP18]] = phi <4 x float> [ [[TMP17]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/int_sideeffect.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S < %s -slp-vectorizer -slp-max-reg-size=128 -slp-min-reg-size=128 \| FileCheck %s			; RUN: opt -S < %s -slp-vectorizer -slp-max-reg-size=128 -slp-min-reg-size=128 \| FileCheck %s

	declare void @llvm.sideeffect()			declare void @llvm.sideeffect()

	; SLP vectorization across a @llvm.sideeffect.			; SLP vectorization across a @llvm.sideeffect.

	define void @test_sideeffect(float* %p) {			define void @test_sideeffect(float* %p) {
	; CHECK-LABEL: @test_sideeffect(			; CHECK-LABEL: @test_sideeffect(
	; CHECK-NEXT: [[P0:%.]] = getelementptr float, float [[P:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr float, float [[P:%.*]], i64 0
	; CHECK-NEXT: [[P1:%.]] = getelementptr float, float [[P]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr float, float [[P]], i64 1
	; CHECK-NEXT: [[P2:%.]] = getelementptr float, float [[P]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr float, float [[P]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr float, float [[P]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr float, float [[P]], i64 3
	; CHECK-NEXT: call void @llvm.sideeffect()			; CHECK-NEXT: call void @llvm.sideeffect()
				; CHECK-NEXT: call void @llvm.sideeffect()
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: call void @llvm.sideeffect()
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P0]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P0]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4			; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%p0 = getelementptr float, float* %p, i64 0			%p0 = getelementptr float, float* %p, i64 0
	%p1 = getelementptr float, float* %p, i64 1			%p1 = getelementptr float, float* %p, i64 1
	%p2 = getelementptr float, float* %p, i64 2			%p2 = getelementptr float, float* %p, i64 2
	%p3 = getelementptr float, float* %p, i64 3			%p3 = getelementptr float, float* %p, i64 3
	Show All 14 Lines

	define void @test_inaccessiblememonly(float* %p) {			define void @test_inaccessiblememonly(float* %p) {
	; CHECK-LABEL: @test_inaccessiblememonly(			; CHECK-LABEL: @test_inaccessiblememonly(
	; CHECK-NEXT: [[P0:%.]] = getelementptr float, float [[P:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr float, float [[P:%.*]], i64 0
	; CHECK-NEXT: [[P1:%.]] = getelementptr float, float [[P]], i64 1			; CHECK-NEXT: [[P1:%.]] = getelementptr float, float [[P]], i64 1
	; CHECK-NEXT: [[P2:%.]] = getelementptr float, float [[P]], i64 2			; CHECK-NEXT: [[P2:%.]] = getelementptr float, float [[P]], i64 2
	; CHECK-NEXT: [[P3:%.]] = getelementptr float, float [[P]], i64 3			; CHECK-NEXT: [[P3:%.]] = getelementptr float, float [[P]], i64 3
	; CHECK-NEXT: call void @foo() #[[ATTR1:[0-9]+]]			; CHECK-NEXT: call void @foo() #[[ATTR1:[0-9]+]]
				; CHECK-NEXT: call void @foo() #[[ATTR1]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P0]] to <4 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: call void @foo() #[[ATTR1]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P0]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[P0]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4			; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%p0 = getelementptr float, float* %p, i64 0			%p0 = getelementptr float, float* %p, i64 0
	%p1 = getelementptr float, float* %p, i64 1			%p1 = getelementptr float, float* %p, i64 1
	%p2 = getelementptr float, float* %p, i64 2			%p2 = getelementptr float, float* %p, i64 2
	%p3 = getelementptr float, float* %p, i64 3			%p3 = getelementptr float, float* %p, i64 3
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Schedule only sub-graph of vectorizable instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 405696

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/64-bit-vector.ll

llvm/test/Transforms/SLPVectorizer/AArch64/commute.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/AArch64/loadi8.ll

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/AArch64/sdiv-pow2.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-and-reduction.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-or-reduction.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-xor-reduction.ll

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-order.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s352.ll

llvm/test/Transforms/SLPVectorizer/AArch64/widen.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll

llvm/test/Transforms/SLPVectorizer/NVPTX/v2f16.ll

llvm/test/Transforms/SLPVectorizer/SystemZ/pr34619.ll

llvm/test/Transforms/SLPVectorizer/X86/PR32086.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/addsub.ll

llvm/test/Transforms/SLPVectorizer/X86/align.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-abs.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-add-usat.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-add.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-div.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-fix.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-smax.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-smin.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssat.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-sub-usat.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-sub.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-umax.ll

llvm/test/Transforms/SLPVectorizer/X86/arith-umin.ll

llvm/test/Transforms/SLPVectorizer/X86/bitreverse.ll

llvm/test/Transforms/SLPVectorizer/X86/broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/bswap.ll

llvm/test/Transforms/SLPVectorizer/X86/combined-stores-chains.ll

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll

llvm/test/Transforms/SLPVectorizer/X86/continue_vectorizing.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_mandeltext.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/ctlz.ll

llvm/test/Transforms/SLPVectorizer/X86/ctpop.ll

llvm/test/Transforms/SLPVectorizer/X86/cttz.ll

llvm/test/Transforms/SLPVectorizer/X86/diamond.ll

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/different-vec-widths.ll

llvm/test/Transforms/SLPVectorizer/X86/dot-product.ll

llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll

llvm/test/Transforms/SLPVectorizer/X86/fabs.ll

llvm/test/Transforms/SLPVectorizer/X86/fcopysign.ll

llvm/test/Transforms/SLPVectorizer/X86/fma.ll

llvm/test/Transforms/SLPVectorizer/X86/fmaxnum.ll

llvm/test/Transforms/SLPVectorizer/X86/fminnum.ll

llvm/test/Transforms/SLPVectorizer/X86/fmuladd.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/fptosi.ll

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

llvm/test/Transforms/SLPVectorizer/X86/fround.ll

llvm/test/Transforms/SLPVectorizer/X86/funclet.ll

llvm/test/Transforms/SLPVectorizer/X86/gep.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

[SLP] Schedule only sub-graph of vectorizable instructions
ClosedPublic