This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
3/4
arith-add-ssat.ll
1/4
arith-mul.ll
-
arith-sub-ssat.ll
-
pr47623.ll
-
shift-ashr.ll
-
sitofp.ll
-
uitofp.ll

Differential D94974

[SLP] Try doubled MaxElts for stores vectorization
AbandonedPublic

Authored by anton-afanasyev on Jan 19 2021, 8:19 AM.

Download Raw Diff

Details

Reviewers

RKSimon
ABataev
dtemirbulatov

Summary

Try to use 2 * MaxElts size of vectors for stores vectorization. This commit
is motivated by effect of bugfixing at reviews.llvm.org/D93192 and tries
to compensate it.
There could be the case, for instance, when cost of pair of <4 x float>
vectorization is zero, but vectorization of <8 x float> is beneficial however.
LLVM vector with 2 * MaxElts cannot be lowered to one register, of course, it is splitted
to two registers.
We try to check 2 * MaxElts after MaxElts not to interfere the ordinary vectorization
which could be accepted as beneficial itself.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	110 ms	x64 windows > LLVM.CodeGen/XCore::threads.ll

Event Timeline

anton-afanasyev created this revision.Jan 19 2021, 8:19 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 19 2021, 8:19 AM

anton-afanasyev requested review of this revision.Jan 19 2021, 8:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 8:19 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Fixed comment containing this D94974 revision number

RKSimon added inline comments.Jan 19 2021, 8:22 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll
136–146	please can you cleanup all these checks ?

Harbormaster completed remote builds in B85720: Diff 317573.Jan 19 2021, 9:40 AM

Harbormaster completed remote builds in B85723: Diff 317579.

Small test fix

anton-afanasyev marked an inline comment as done.Jan 27 2021, 9:22 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll
136–146	Fixed this line in test or did you mean to precommit check prefixes?

RKSimon added inline comments.Jan 27 2021, 10:12 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll
136–146	We seem to have AVX and AVX1 check prefixes now - go back and replace the check-prefixes=AVX with check-prefixes=AVX1 (not sure if we can have a common AVX for AVX1 + AVX2)?

Clean up tests

anton-afanasyev marked an inline comment as done.Jan 27 2021, 10:34 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll
136–146	Oh, I see. Done. (no, we can't use common AVX for AVX1 + AVX2 in that case).

RKSimon added inline comments.Jan 27 2021, 10:45 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll
5	--check-prefixes=AVX_PREFER128,AVX1_PREFER128
7	--check-prefixes=AVX_PREFER128,AVX2_PREFER128
183	We're setting prefer-128-bit and yet still generating <4 x i64> ops?

Harbormaster completed remote builds in B86863: Diff 319596.Jan 27 2021, 10:49 AM

Harbormaster completed remote builds in B86875: Diff 319616.Jan 27 2021, 12:06 PM

anton-afanasyev added inline comments.Feb 2 2021, 7:20 AM

llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll
183	Hmm, yes, you're right, that's strange to generate `<4 x i64>` for the case with preferable width (=128). But we can't check this at the abstract llvm level. Generally we don't know the target constraints, so this my patch looks too tricky for such cases.

Due to what said above, I'm to abandon this change. It looks like over-optimization, breaking llvm IR middle-end abstraction.

anton-afanasyev abandoned this revision.Feb 5 2021, 5:43 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

13 lines

test/

Transforms/

SLPVectorizer/

X86/

152 lines

164 lines

152 lines

13 lines

62 lines

91 lines

91 lines

Diff 319616

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,074 Lines • ▼ Show 20 Lines	for (int Cnt = E; Cnt > 0; --Cnt) {
}		}

// If a vector register can't hold 1 element, we are done.		// If a vector register can't hold 1 element, we are done.
unsigned MaxVecRegSize = R.getMaxVecRegSize();		unsigned MaxVecRegSize = R.getMaxVecRegSize();
unsigned EltSize = R.getVectorElementSize(Operands[0]);		unsigned EltSize = R.getVectorElementSize(Operands[0]);
if (MaxVecRegSize % EltSize != 0)		if (MaxVecRegSize % EltSize != 0)
continue;		continue;

unsigned MaxElts = MaxVecRegSize / EltSize;		unsigned MaxElts = llvm::PowerOf2Ceil(MaxVecRegSize / EltSize);
// FIXME: Is division-by-2 the correct step? Should we assert that the		// FIXME: Is division-by-2 the correct step? Should we assert that the
// register size is a power-of-2?		// register size is a power-of-2?
		// We check Size = 2 * MaxElts after Size = MaxElts to cover
		// the border cases (see D94974 for details).
		SmallVector<unsigned, 4> Sizes;
		Sizes.push_back(MaxElts);
		Sizes.push_back(2 * MaxElts);
		for (unsigned Size = MaxElts / 2; Size >= 2; Size /= 2) {
		Sizes.push_back(Size);
		}

unsigned StartIdx = 0;		unsigned StartIdx = 0;
for (unsigned Size = llvm::PowerOf2Ceil(MaxElts); Size >= 2; Size /= 2) {		for (unsigned Size : Sizes) {
for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {		for (unsigned Cnt = StartIdx, E = Operands.size(); Cnt + Size <= E;) {
ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);		ArrayRef<Value *> Slice = makeArrayRef(Operands).slice(Cnt, Size);
if (!VectorizedStores.count(Slice.front()) &&		if (!VectorizedStores.count(Slice.front()) &&
!VectorizedStores.count(Slice.back()) &&		!VectorizedStores.count(Slice.back()) &&
vectorizeStoreChain(Slice, R, Cnt)) {		vectorizeStoreChain(Slice, R, Cnt)) {
// Mark the vectorized stores so that we don't vectorize them again.		// Mark the vectorized stores so that we don't vectorize them again.
VectorizedStores.insert(Slice.begin(), Slice.end());		VectorizedStores.insert(Slice.begin(), Slice.end());
Changed = true;		Changed = true;
▲ Show 20 Lines • Show All 1,832 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-add-ssat.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX1
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=PREFER256

	@a64 = common global [8 x i64] zeroinitializer, align 64			@a64 = common global [8 x i64] zeroinitializer, align 64
	@b64 = common global [8 x i64] zeroinitializer, align 64			@b64 = common global [8 x i64] zeroinitializer, align 64
	@c64 = common global [8 x i64] zeroinitializer, align 64			@c64 = common global [8 x i64] zeroinitializer, align 64
	@a32 = common global [16 x i32] zeroinitializer, align 64			@a32 = common global [16 x i32] zeroinitializer, align 64
	@b32 = common global [16 x i32] zeroinitializer, align 64			@b32 = common global [16 x i32] zeroinitializer, align 64
	@c32 = common global [16 x i32] zeroinitializer, align 64			@c32 = common global [16 x i32] zeroinitializer, align 64
	@a16 = common global [32 x i16] zeroinitializer, align 64			@a16 = common global [32 x i16] zeroinitializer, align 64
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	;			;
	; AVX512-LABEL: @add_v8i64(			; AVX512-LABEL: @add_v8i64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
	; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8			; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; AVX256BW-LABEL: @add_v8i64(			; PREFER256-LABEL: @add_v8i64(
	; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])			; PREFER256-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])
	; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])			; PREFER256-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])
	; AVX256BW-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8			; PREFER256-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: ret void			; PREFER256-NEXT: ret void
	;			;
				RKSimonUnsubmitted Done Reply Inline Actions please can you cleanup all these checks ? RKSimon: please can you cleanup all these checks ?
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Fixed this line in test or did you mean to precommit check prefixes? anton-afanasyev: Fixed this line in test or did you mean to precommit check prefixes?
				RKSimonUnsubmitted Not Done Reply Inline Actions We seem to have AVX and AVX1 check prefixes now - go back and replace the check-prefixes=AVX with check-prefixes=AVX1 (not sure if we can have a common AVX for AVX1 + AVX2)? RKSimon: We seem to have AVX and AVX1 check prefixes now - go back and replace the check-prefixes=AVX…
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Oh, I see. Done. (no, we can't use common AVX for AVX1 + AVX2 in that case). anton-afanasyev: Oh, I see. Done. (no, we can't use common AVX for AVX1 + AVX2 in that case).
	%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8			%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
	%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8			%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
	%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8			%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
	%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8			%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
	%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8			%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
	%a5 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 5), align 8			%a5 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 5), align 8
	%a6 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6), align 8			%a6 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6), align 8
	%a7 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 7), align 8			%a7 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 7), align 8
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])
	; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @add_v16i32(			; AVX1-LABEL: @add_v16i32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
	; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4			; AVX1-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
	; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @add_v16i32(
				; AVX2-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
				; AVX2-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
				; AVX2-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @add_v16i32(			; AVX512-LABEL: @add_v16i32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
	; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4			; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @add_v16i32(
				; PREFER256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
				; PREFER256-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
				; PREFER256-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: ret void
				;
	%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4			%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
	%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4			%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4
	%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4			%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4
	%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4			%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4
	%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4			%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4
	%a5 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5 ), align 4			%a5 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5 ), align 4
	%a6 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6 ), align 4			%a6 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6 ), align 4
	%a7 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7 ), align 4			%a7 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7 ), align 4
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])
	; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @add_v32i16(			; AVX1-LABEL: @add_v32i16(
	; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
	; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
	; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @add_v32i16(
				; AVX2-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
				; AVX2-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
				; AVX2-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @add_v32i16(			; AVX512-LABEL: @add_v32i16(
	; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
	; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2			; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @add_v32i16(
				; PREFER256-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
				; PREFER256-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
				; PREFER256-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: ret void
				;
	%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2			%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
	%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2			%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2
	%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2			%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2
	%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2			%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2
	%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2			%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2
	%a5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5 ), align 2			%a5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5 ), align 2
	%a6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6 ), align 2			%a6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6 ), align 2
	%a7 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7 ), align 2			%a7 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7 ), align 2
	▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.sadd.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])
	; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @add_v64i8(			; AVX1-LABEL: @add_v64i8(
	; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
	; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
	; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @add_v64i8(
				; AVX2-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
				; AVX2-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
				; AVX2-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @add_v64i8(			; AVX512-LABEL: @add_v64i8(
	; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
	; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1			; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @add_v64i8(
				; PREFER256-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
				; PREFER256-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
				; PREFER256-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: ret void
				;
	%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1			%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
	%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1			%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1
	%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1			%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1
	%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1			%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1
	%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1			%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1
	%a5 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 5 ), align 1			%a5 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 5 ), align 1
	%a6 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 6 ), align 1			%a6 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 6 ), align 1
	%a7 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 7 ), align 1			%a7 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 7 ), align 1
	▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX1
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX_PREFER128
				RKSimonUnsubmitted Not Done Reply Inline Actions --check-prefixes=AVX_PREFER128,AVX1_PREFER128 RKSimon: --check-prefixes=AVX_PREFER128,AVX1_PREFER128
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=-prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -mattr=+prefer-128-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX2_PREFER128
				RKSimonUnsubmitted Not Done Reply Inline Actions --check-prefixes=AVX_PREFER128,AVX2_PREFER128 RKSimon: --check-prefixes=AVX_PREFER128,AVX2_PREFER128
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2

	@a64 = common global [8 x i64] zeroinitializer, align 64			@a64 = common global [8 x i64] zeroinitializer, align 64
	@b64 = common global [8 x i64] zeroinitializer, align 64			@b64 = common global [8 x i64] zeroinitializer, align 64
	@c64 = common global [8 x i64] zeroinitializer, align 64			@c64 = common global [8 x i64] zeroinitializer, align 64
	@a32 = common global [16 x i32] zeroinitializer, align 64			@a32 = common global [16 x i32] zeroinitializer, align 64
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: store i64 [[R2]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2), align 8			; AVX1-NEXT: store i64 [[R2]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2), align 8
	; AVX1-NEXT: store i64 [[R3]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 3), align 8			; AVX1-NEXT: store i64 [[R3]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 3), align 8
	; AVX1-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8			; AVX1-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
	; AVX1-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8			; AVX1-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
	; AVX1-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8			; AVX1-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
	; AVX1-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8			; AVX1-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
	; AVX1-NEXT: ret void			; AVX1-NEXT: ret void
	;			;
				; AVX_PREFER128-LABEL: @mul_v8i64(
				; AVX_PREFER128-NEXT: [[A0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
				; AVX_PREFER128-NEXT: [[A1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
				; AVX_PREFER128-NEXT: [[A2:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
				; AVX_PREFER128-NEXT: [[A3:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
				; AVX_PREFER128-NEXT: [[A4:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
				; AVX_PREFER128-NEXT: [[A5:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 5), align 8
				; AVX_PREFER128-NEXT: [[A6:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6), align 8
				; AVX_PREFER128-NEXT: [[A7:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 7), align 8
				; AVX_PREFER128-NEXT: [[B0:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 0), align 8
				; AVX_PREFER128-NEXT: [[B1:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 1), align 8
				; AVX_PREFER128-NEXT: [[B2:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 2), align 8
				; AVX_PREFER128-NEXT: [[B3:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 3), align 8
				; AVX_PREFER128-NEXT: [[B4:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4), align 8
				; AVX_PREFER128-NEXT: [[B5:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 5), align 8
				; AVX_PREFER128-NEXT: [[B6:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 6), align 8
				; AVX_PREFER128-NEXT: [[B7:%.]] = load i64, i64 getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 7), align 8
				; AVX_PREFER128-NEXT: [[R0:%.*]] = mul i64 [[A0]], [[B0]]
				; AVX_PREFER128-NEXT: [[R1:%.*]] = mul i64 [[A1]], [[B1]]
				; AVX_PREFER128-NEXT: [[R2:%.*]] = mul i64 [[A2]], [[B2]]
				; AVX_PREFER128-NEXT: [[R3:%.*]] = mul i64 [[A3]], [[B3]]
				; AVX_PREFER128-NEXT: [[R4:%.*]] = mul i64 [[A4]], [[B4]]
				; AVX_PREFER128-NEXT: [[R5:%.*]] = mul i64 [[A5]], [[B5]]
				; AVX_PREFER128-NEXT: [[R6:%.*]] = mul i64 [[A6]], [[B6]]
				; AVX_PREFER128-NEXT: [[R7:%.*]] = mul i64 [[A7]], [[B7]]
				; AVX_PREFER128-NEXT: store i64 [[R0]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 0), align 8
				; AVX_PREFER128-NEXT: store i64 [[R1]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 1), align 8
				; AVX_PREFER128-NEXT: store i64 [[R2]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 2), align 8
				; AVX_PREFER128-NEXT: store i64 [[R3]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 3), align 8
				; AVX_PREFER128-NEXT: store i64 [[R4]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4), align 8
				; AVX_PREFER128-NEXT: store i64 [[R5]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 5), align 8
				; AVX_PREFER128-NEXT: store i64 [[R6]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 6), align 8
				; AVX_PREFER128-NEXT: store i64 [[R7]], i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 7), align 8
				; AVX_PREFER128-NEXT: ret void
				;
	; AVX2-LABEL: @mul_v8i64(			; AVX2-LABEL: @mul_v8i64(
	; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8			; AVX2-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
	; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8			; AVX2-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX2-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8			; AVX2-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
	; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8			; AVX2-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX2-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP1]], [[TMP3]]			; AVX2-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP1]], [[TMP3]]
	; AVX2-NEXT: [[TMP6:%.*]] = mul <4 x i64> [[TMP2]], [[TMP4]]			; AVX2-NEXT: [[TMP6:%.*]] = mul <4 x i64> [[TMP2]], [[TMP4]]
	; AVX2-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8			; AVX2-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
	; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8			; AVX2-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
				; AVX2_PREFER128-LABEL: @mul_v8i64(
				; AVX2_PREFER128-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: [[TMP5:%.*]] = mul <4 x i64> [[TMP1]], [[TMP3]]
				; AVX2_PREFER128-NEXT: [[TMP6:%.*]] = mul <4 x i64> [[TMP2]], [[TMP4]]
				RKSimonUnsubmitted Not Done Reply Inline Actions We're setting prefer-128-bit and yet still generating <4 x i64> ops? RKSimon: We're setting prefer-128-bit and yet still generating <4 x i64> ops?
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, yes, you're right, that's strange to generate `<4 x i64>` for the case with preferable width (=128). But we can't check this at the abstract llvm level. Generally we don't know the target constraints, so this my patch looks too tricky for such cases. anton-afanasyev: Hmm, yes, you're right, that's strange to generate `<4 x i64>` for the case with preferable…
				; AVX2_PREFER128-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
				; AVX2_PREFER128-NEXT: ret void
				;
	; AVX512-LABEL: @mul_v8i64(			; AVX512-LABEL: @mul_v8i64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = mul <8 x i64> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = mul <8 x i64> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8			; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8			%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4			; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP5:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = mul <8 x i32> [[TMP2]], [[TMP4]]			; AVX-NEXT: [[TMP6:%.*]] = mul <8 x i32> [[TMP2]], [[TMP4]]
	; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4			; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
	; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
				; AVX_PREFER128-LABEL: @mul_v16i32(
				; AVX_PREFER128-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP1]], [[TMP5]]
				; AVX_PREFER128-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP2]], [[TMP6]]
				; AVX_PREFER128-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP3]], [[TMP7]]
				; AVX_PREFER128-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP4]], [[TMP8]]
				; AVX_PREFER128-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX_PREFER128-NEXT: ret void
				;
				; AVX2_PREFER128-LABEL: @mul_v16i32(
				; AVX2_PREFER128-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @a32 to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast ([16 x i32]* @b32 to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP7:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP8:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: [[TMP9:%.*]] = mul <4 x i32> [[TMP1]], [[TMP5]]
				; AVX2_PREFER128-NEXT: [[TMP10:%.*]] = mul <4 x i32> [[TMP2]], [[TMP6]]
				; AVX2_PREFER128-NEXT: [[TMP11:%.*]] = mul <4 x i32> [[TMP3]], [[TMP7]]
				; AVX2_PREFER128-NEXT: [[TMP12:%.*]] = mul <4 x i32> [[TMP4]], [[TMP8]]
				; AVX2_PREFER128-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
				; AVX2_PREFER128-NEXT: ret void
				;
	; AVX512-LABEL: @mul_v16i32(			; AVX512-LABEL: @mul_v16i32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP3:%.*]] = mul <16 x i32> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = mul <16 x i32> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4			; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4			%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2			; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP5:%.*]] = mul <16 x i16> [[TMP1]], [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = mul <16 x i16> [[TMP1]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = mul <16 x i16> [[TMP2]], [[TMP4]]			; AVX-NEXT: [[TMP6:%.*]] = mul <16 x i16> [[TMP2]], [[TMP4]]
	; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2			; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
	; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
				; AVX_PREFER128-LABEL: @mul_v32i16(
				; AVX_PREFER128-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP1]], [[TMP5]]
				; AVX_PREFER128-NEXT: [[TMP10:%.*]] = mul <8 x i16> [[TMP2]], [[TMP6]]
				; AVX_PREFER128-NEXT: [[TMP11:%.*]] = mul <8 x i16> [[TMP3]], [[TMP7]]
				; AVX_PREFER128-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP4]], [[TMP8]]
				; AVX_PREFER128-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX_PREFER128-NEXT: ret void
				;
				; AVX2_PREFER128-LABEL: @mul_v32i16(
				; AVX2_PREFER128-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @a16 to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP3:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP5:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @b16 to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP6:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP7:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP8:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: [[TMP9:%.*]] = mul <8 x i16> [[TMP1]], [[TMP5]]
				; AVX2_PREFER128-NEXT: [[TMP10:%.*]] = mul <8 x i16> [[TMP2]], [[TMP6]]
				; AVX2_PREFER128-NEXT: [[TMP11:%.*]] = mul <8 x i16> [[TMP3]], [[TMP7]]
				; AVX2_PREFER128-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP4]], [[TMP8]]
				; AVX2_PREFER128-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
				; AVX2_PREFER128-NEXT: ret void
				;
	; AVX512-LABEL: @mul_v32i16(			; AVX512-LABEL: @mul_v32i16(
	; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP3:%.*]] = mul <32 x i16> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = mul <32 x i16> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2			; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2			%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
	▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1			; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP5:%.*]] = mul <32 x i8> [[TMP1]], [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = mul <32 x i8> [[TMP1]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = mul <32 x i8> [[TMP2]], [[TMP4]]			; AVX-NEXT: [[TMP6:%.*]] = mul <32 x i8> [[TMP2]], [[TMP4]]
	; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1			; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
	; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
				; AVX_PREFER128-LABEL: @mul_v64i8(
				; AVX_PREFER128-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP1]], [[TMP5]]
				; AVX_PREFER128-NEXT: [[TMP10:%.*]] = mul <16 x i8> [[TMP2]], [[TMP6]]
				; AVX_PREFER128-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP3]], [[TMP7]]
				; AVX_PREFER128-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP4]], [[TMP8]]
				; AVX_PREFER128-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX_PREFER128-NEXT: ret void
				;
				; AVX2_PREFER128-LABEL: @mul_v64i8(
				; AVX2_PREFER128-NEXT: [[TMP1:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @a8 to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP2:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP3:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> bitcast ([64 x i8]* @b8 to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP6:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP7:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP1]], [[TMP5]]
				; AVX2_PREFER128-NEXT: [[TMP10:%.*]] = mul <16 x i8> [[TMP2]], [[TMP6]]
				; AVX2_PREFER128-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP3]], [[TMP7]]
				; AVX2_PREFER128-NEXT: [[TMP12:%.*]] = mul <16 x i8> [[TMP4]], [[TMP8]]
				; AVX2_PREFER128-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
				; AVX2_PREFER128-NEXT: ret void
				;
	; AVX512-LABEL: @mul_v64i8(			; AVX512-LABEL: @mul_v64i8(
	; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP3:%.*]] = mul <64 x i8> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = mul <64 x i8> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1			; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1			%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
	▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/arith-sub-ssat.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX1
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=PREFER256

	@a64 = common global [8 x i64] zeroinitializer, align 64			@a64 = common global [8 x i64] zeroinitializer, align 64
	@b64 = common global [8 x i64] zeroinitializer, align 64			@b64 = common global [8 x i64] zeroinitializer, align 64
	@c64 = common global [8 x i64] zeroinitializer, align 64			@c64 = common global [8 x i64] zeroinitializer, align 64
	@a32 = common global [16 x i32] zeroinitializer, align 64			@a32 = common global [16 x i32] zeroinitializer, align 64
	@b32 = common global [16 x i32] zeroinitializer, align 64			@b32 = common global [16 x i32] zeroinitializer, align 64
	@c32 = common global [16 x i32] zeroinitializer, align 64			@c32 = common global [16 x i32] zeroinitializer, align 64
	@a16 = common global [32 x i16] zeroinitializer, align 64			@a16 = common global [32 x i16] zeroinitializer, align 64
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	;			;
	; AVX512-LABEL: @sub_v8i64(			; AVX512-LABEL: @sub_v8i64(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP1:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @a64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8			; AVX512-NEXT: [[TMP2:%.]] = load <8 x i64>, <8 x i64> bitcast ([8 x i64]* @b64 to <8 x i64>*), align 8
	; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <8 x i64> @llvm.ssub.sat.v8i64(<8 x i64> [[TMP1]], <8 x i64> [[TMP2]])
	; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8			; AVX512-NEXT: store <8 x i64> [[TMP3]], <8 x i64>* bitcast ([8 x i64]* @c64 to <8 x i64>*), align 8
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; AVX256BW-LABEL: @sub_v8i64(			; PREFER256-LABEL: @sub_v8i64(
	; AVX256BW-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @a64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP2:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> bitcast ([8 x i64]* @b64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @b64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])			; PREFER256-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP1]], <4 x i64> [[TMP3]])
	; AVX256BW-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])			; PREFER256-NEXT: [[TMP6:%.*]] = call <4 x i64> @llvm.ssub.sat.v4i64(<4 x i64> [[TMP2]], <4 x i64> [[TMP4]])
	; AVX256BW-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8			; PREFER256-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* bitcast ([8 x i64]* @c64 to <4 x i64>*), align 8
	; AVX256BW-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8			; PREFER256-NEXT: store <4 x i64> [[TMP6]], <4 x i64>* bitcast (i64* getelementptr inbounds ([8 x i64], [8 x i64]* @c64, i32 0, i64 4) to <4 x i64>*), align 8
	; AVX256BW-NEXT: ret void			; PREFER256-NEXT: ret void
	;			;
	%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8			%a0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 0), align 8
	%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8			%a1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 1), align 8
	%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8			%a2 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 2), align 8
	%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8			%a3 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 3), align 8
	%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8			%a4 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 4), align 8
	%a5 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 5), align 8			%a5 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 5), align 8
	%a6 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6), align 8			%a6 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @a64, i32 0, i64 6), align 8
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP3]], <4 x i32> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[TMP4]], <4 x i32> [[TMP8]])
	; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* bitcast ([16 x i32]* @c32 to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <4 x i32>*), align 4
	; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4			; SLM-NEXT: store <4 x i32> [[TMP12]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <4 x i32>*), align 4
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @sub_v16i32(			; AVX1-LABEL: @sub_v16i32(
	; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
	; AVX-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4			; AVX1-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
	; AVX-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4			; AVX1-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @sub_v16i32(
				; AVX2-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
				; AVX2-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
				; AVX2-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @sub_v16i32(			; AVX512-LABEL: @sub_v16i32(
	; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP1:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @a32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4			; AVX512-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([16 x i32]* @b32 to <16 x i32>*), align 4
	; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <16 x i32> @llvm.ssub.sat.v16i32(<16 x i32> [[TMP1]], <16 x i32> [[TMP2]])
	; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4			; AVX512-NEXT: store <16 x i32> [[TMP3]], <16 x i32>* bitcast ([16 x i32]* @c32 to <16 x i32>*), align 4
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @sub_v16i32(
				; PREFER256-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @a32 to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> bitcast ([16 x i32]* @b32 to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP4:%.]] = load <8 x i32>, <8 x i32> bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: [[TMP5:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP1]], <8 x i32> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <8 x i32> @llvm.ssub.sat.v8i32(<8 x i32> [[TMP2]], <8 x i32> [[TMP4]])
				; PREFER256-NEXT: store <8 x i32> [[TMP5]], <8 x i32>* bitcast ([16 x i32]* @c32 to <8 x i32>*), align 4
				; PREFER256-NEXT: store <8 x i32> [[TMP6]], <8 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <8 x i32>*), align 4
				; PREFER256-NEXT: ret void
				;
	%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4			%a0 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0 ), align 4
	%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4			%a1 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1 ), align 4
	%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4			%a2 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2 ), align 4
	%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4			%a3 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3 ), align 4
	%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4			%a4 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4 ), align 4
	%a5 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5 ), align 4			%a5 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5 ), align 4
	%a6 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6 ), align 4			%a6 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6 ), align 4
	%a7 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7 ), align 4			%a7 = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7 ), align 4
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP3]], <8 x i16> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <8 x i16> @llvm.ssub.sat.v8i16(<8 x i16> [[TMP4]], <8 x i16> [[TMP8]])
	; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* bitcast ([32 x i16]* @c16 to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP10]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP11]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <8 x i16>*), align 2
	; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2			; SLM-NEXT: store <8 x i16> [[TMP12]], <8 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <8 x i16>*), align 2
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @sub_v32i16(			; AVX1-LABEL: @sub_v32i16(
	; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
	; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
	; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @sub_v32i16(
				; AVX2-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
				; AVX2-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
				; AVX2-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @sub_v32i16(			; AVX512-LABEL: @sub_v32i16(
	; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <32 x i16> @llvm.ssub.sat.v32i16(<32 x i16> [[TMP1]], <32 x i16> [[TMP2]])
	; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2			; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @sub_v32i16(
				; PREFER256-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: [[TMP5:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP1]], <16 x i16> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <16 x i16> @llvm.ssub.sat.v16i16(<16 x i16> [[TMP2]], <16 x i16> [[TMP4]])
				; PREFER256-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
				; PREFER256-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
				; PREFER256-NEXT: ret void
				;
	%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2			%a0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0 ), align 2
	%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2			%a1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1 ), align 2
	%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2			%a2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2 ), align 2
	%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2			%a3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3 ), align 2
	%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2			%a4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4 ), align 2
	%a5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5 ), align 2			%a5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5 ), align 2
	%a6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6 ), align 2			%a6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6 ), align 2
	%a7 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7 ), align 2			%a7 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7 ), align 2
	▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])			; SLM-NEXT: [[TMP11:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP3]], <16 x i8> [[TMP7]])
	; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])			; SLM-NEXT: [[TMP12:%.*]] = call <16 x i8> @llvm.ssub.sat.v16i8(<16 x i8> [[TMP4]], <16 x i8> [[TMP8]])
	; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
	; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1			; SLM-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
	; SLM-NEXT: ret void			; SLM-NEXT: ret void
	;			;
	; AVX-LABEL: @sub_v64i8(			; AVX1-LABEL: @sub_v64i8(
	; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])			; AVX1-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
	; AVX-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])			; AVX1-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
	; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
	; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @sub_v64i8(
				; AVX2-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
				; AVX2-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
				; AVX2-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
				; AVX2-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @sub_v64i8(			; AVX512-LABEL: @sub_v64i8(
	; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.ssub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])			; AVX512-NEXT: [[TMP3:%.*]] = call <64 x i8> @llvm.ssub.sat.v64i8(<64 x i8> [[TMP1]], <64 x i8> [[TMP2]])
	; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1			; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
				; PREFER256-LABEL: @sub_v64i8(
				; PREFER256-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: [[TMP5:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP1]], <32 x i8> [[TMP3]])
				; PREFER256-NEXT: [[TMP6:%.*]] = call <32 x i8> @llvm.ssub.sat.v32i8(<32 x i8> [[TMP2]], <32 x i8> [[TMP4]])
				; PREFER256-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
				; PREFER256-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
				; PREFER256-NEXT: ret void
				;
	%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1			%a0 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 0 ), align 1
	%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1			%a1 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 1 ), align 1
	%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1			%a2 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 2 ), align 1
	%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1			%a3 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 3 ), align 1
	%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1			%a4 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 4 ), align 1
	%a5 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 5 ), align 1			%a5 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 5 ), align 1
	%a6 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 6 ), align 1			%a6 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 6 ), align 1
	%a7 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 7 ), align 1			%a7 = load i8, i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 7 ), align 1
	▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47623.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -slp-vectorizer -instcombine -S -mtriple=x86_64-unknown-linux -mattr=+avx512vl \| FileCheck %s --check-prefixes=AVX


	@b = global [8 x i32] zeroinitializer, align 16			@b = global [8 x i32] zeroinitializer, align 16
	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @foo() {			define void @foo() {
	; SSE-LABEL: @foo(			; SSE-LABEL: @foo(
	; SSE-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), align 16			; SSE-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.masked.gather.v2i32.v2p0i32(<2 x i32> <i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2)>, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i32> undef)
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 0), align 16			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2), align 8			; SSE-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 1), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 2), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 16
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
	; SSE-NEXT: store i32 [[TMP1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 8
	; SSE-NEXT: store i32 [[TMP2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @foo(			; AVX-LABEL: @foo(
	; AVX-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.masked.gather.v2i32.v2p0i32(<2 x i32> <i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2)>, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i32> undef)			; AVX-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.masked.gather.v2i32.v2p0i32(<2 x i32> <i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 0), i32* getelementptr inbounds ([8 x i32], [8 x i32]* @b, i64 0, i64 2)>, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i32> undef)
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
	; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16			; AVX-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* bitcast ([8 x i32]* @a to <8 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/X86/shift-ashr.ll

	Show First 20 Lines • Show All 464 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i16 [[R26]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 26), align 2			; SSE-NEXT: store i16 [[R26]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 26), align 2
	; SSE-NEXT: store i16 [[R27]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 27), align 2			; SSE-NEXT: store i16 [[R27]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 27), align 2
	; SSE-NEXT: store i16 [[R28]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28), align 2			; SSE-NEXT: store i16 [[R28]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28), align 2
	; SSE-NEXT: store i16 [[R29]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 29), align 2			; SSE-NEXT: store i16 [[R29]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 29), align 2
	; SSE-NEXT: store i16 [[R30]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2			; SSE-NEXT: store i16 [[R30]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
	; SSE-NEXT: store i16 [[R31]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2			; SSE-NEXT: store i16 [[R31]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @ashr_v32i16(			; AVX1-LABEL: @ashr_v32i16(
	; AVX-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: [[TMP5:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP3]]			; AVX1-NEXT: [[TMP5:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP2]], [[TMP4]]			; AVX1-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP2]], [[TMP4]]
	; AVX-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
	; AVX-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2			; AVX1-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @ashr_v32i16(
				; AVX2-NEXT: [[TMP1:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @a16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP2:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP3:%.]] = load <16 x i16>, <16 x i16> bitcast ([32 x i16]* @b16 to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP4:%.]] = load <16 x i16>, <16 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: [[TMP5:%.*]] = ashr <16 x i16> [[TMP1]], [[TMP3]]
				; AVX2-NEXT: [[TMP6:%.*]] = ashr <16 x i16> [[TMP2]], [[TMP4]]
				; AVX2-NEXT: store <16 x i16> [[TMP5]], <16 x i16>* bitcast ([32 x i16]* @c16 to <16 x i16>*), align 2
				; AVX2-NEXT: store <16 x i16> [[TMP6]], <16 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <16 x i16>*), align 2
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @ashr_v32i16(			; AVX512-LABEL: @ashr_v32i16(
	; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP1:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @a16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2			; AVX512-NEXT: [[TMP2:%.]] = load <32 x i16>, <32 x i16> bitcast ([32 x i16]* @b16 to <32 x i16>*), align 2
	; AVX512-NEXT: [[TMP3:%.*]] = ashr <32 x i16> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = ashr <32 x i16> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2			; AVX512-NEXT: store <32 x i16> [[TMP3]], <32 x i16>* bitcast ([32 x i16]* @c16 to <32 x i16>*), align 2
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP11:%.*]] = ashr <16 x i8> [[TMP3]], [[TMP7]]			; SSE-NEXT: [[TMP11:%.*]] = ashr <16 x i8> [[TMP3]], [[TMP7]]
	; SSE-NEXT: [[TMP12:%.*]] = ashr <16 x i8> [[TMP4]], [[TMP8]]			; SSE-NEXT: [[TMP12:%.*]] = ashr <16 x i8> [[TMP4]], [[TMP8]]
	; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1			; SSE-NEXT: store <16 x i8> [[TMP9]], <16 x i8>* bitcast ([64 x i8]* @c8 to <16 x i8>*), align 1
	; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1			; SSE-NEXT: store <16 x i8> [[TMP10]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 16) to <16 x i8>*), align 1
	; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1			; SSE-NEXT: store <16 x i8> [[TMP11]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <16 x i8>*), align 1
	; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1			; SSE-NEXT: store <16 x i8> [[TMP12]], <16 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 48) to <16 x i8>*), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @ashr_v64i8(			; AVX1-LABEL: @ashr_v64i8(
	; AVX-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: [[TMP5:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP3]]			; AVX1-NEXT: [[TMP5:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP3]]
	; AVX-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP2]], [[TMP4]]			; AVX1-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP2]], [[TMP4]]
	; AVX-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
	; AVX-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1			; AVX1-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
	; AVX-NEXT: ret void			; AVX1-NEXT: ret void
				;
				; AVX2-LABEL: @ashr_v64i8(
				; AVX2-NEXT: [[TMP1:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @a8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP2:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @a8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP3:%.]] = load <32 x i8>, <32 x i8> bitcast ([64 x i8]* @b8 to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP4:%.]] = load <32 x i8>, <32 x i8> bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @b8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: [[TMP5:%.*]] = ashr <32 x i8> [[TMP1]], [[TMP3]]
				; AVX2-NEXT: [[TMP6:%.*]] = ashr <32 x i8> [[TMP2]], [[TMP4]]
				; AVX2-NEXT: store <32 x i8> [[TMP5]], <32 x i8>* bitcast ([64 x i8]* @c8 to <32 x i8>*), align 1
				; AVX2-NEXT: store <32 x i8> [[TMP6]], <32 x i8>* bitcast (i8* getelementptr inbounds ([64 x i8], [64 x i8]* @c8, i32 0, i64 32) to <32 x i8>*), align 1
				; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @ashr_v64i8(			; AVX512-LABEL: @ashr_v64i8(
	; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP1:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @a8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1			; AVX512-NEXT: [[TMP2:%.]] = load <64 x i8>, <64 x i8> bitcast ([64 x i8]* @b8 to <64 x i8>*), align 1
	; AVX512-NEXT: [[TMP3:%.*]] = ashr <64 x i8> [[TMP1]], [[TMP2]]			; AVX512-NEXT: [[TMP3:%.*]] = ashr <64 x i8> [[TMP1]], [[TMP2]]
	; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1			; AVX512-NEXT: store <64 x i8> [[TMP3]], <64 x i8>* bitcast ([64 x i8]* @c8 to <64 x i8>*), align 1
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

Show First 20 Lines • Show All 947 Lines • ▼ Show 20 Lines	;
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @sitofp_8i16_8f32() #0 {		define void @sitofp_8i16_8f32() #0 {
; SSE-LABEL: @sitofp_8i16_8f32(		; CHECK-LABEL: @sitofp_8i16_8f32(
; SSE-NEXT: [[LD0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		; CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; SSE-NEXT: [[LD1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		; CHECK-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; SSE-NEXT: [[LD2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; SSE-NEXT: [[LD3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		; CHECK-NEXT: ret void
; SSE-NEXT: [[LD4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8
; SSE-NEXT: [[LD5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2
; SSE-NEXT: [[LD6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
; SSE-NEXT: [[LD7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 7), align 2
; SSE-NEXT: [[CVT0:%.*]] = sitofp i16 [[LD0]] to float
; SSE-NEXT: [[CVT1:%.*]] = sitofp i16 [[LD1]] to float
; SSE-NEXT: [[CVT2:%.*]] = sitofp i16 [[LD2]] to float
; SSE-NEXT: [[CVT3:%.*]] = sitofp i16 [[LD3]] to float
; SSE-NEXT: [[CVT4:%.*]] = sitofp i16 [[LD4]] to float
; SSE-NEXT: [[CVT5:%.*]] = sitofp i16 [[LD5]] to float
; SSE-NEXT: [[CVT6:%.*]] = sitofp i16 [[LD6]] to float
; SSE-NEXT: [[CVT7:%.*]] = sitofp i16 [[LD7]] to float
; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; SSE-NEXT: store float [[CVT2]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
; SSE-NEXT: store float [[CVT3]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
; SSE-NEXT: store float [[CVT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 16
; SSE-NEXT: store float [[CVT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
; SSE-NEXT: store float [[CVT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
; SSE-NEXT: store float [[CVT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
; SSE-NEXT: ret void
;
; AVX-LABEL: @sitofp_8i16_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void
;		;
%ld0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		%ld0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64
%ld1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		%ld1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2
%ld2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		%ld2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4
%ld3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		%ld3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2
%ld4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8		%ld4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8
%ld5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2		%ld5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2
%ld6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4		%ld6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
Show All 14 Lines	;
store float %cvt5, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4		store float %cvt5, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @sitofp_16i16_16f32() #0 {		define void @sitofp_16i16_16f32() #0 {
; SSE-LABEL: @sitofp_16i16_16f32(		; SSE-LABEL: @sitofp_16i16_16f32(
; SSE-NEXT: [[LD0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; SSE-NEXT: [[LD1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; SSE-NEXT: [[LD2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		; SSE-NEXT: [[TMP3:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; SSE-NEXT: [[LD3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		; SSE-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP2]] to <8 x float>
; SSE-NEXT: [[LD4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8		; SSE-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; SSE-NEXT: [[LD5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2		; SSE-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; SSE-NEXT: [[LD6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
; SSE-NEXT: [[LD7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 7), align 2
; SSE-NEXT: [[LD8:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8), align 16
; SSE-NEXT: [[LD9:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 9), align 2
; SSE-NEXT: [[LD10:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 10), align 4
; SSE-NEXT: [[LD11:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 11), align 2
; SSE-NEXT: [[LD12:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12), align 8
; SSE-NEXT: [[LD13:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 13), align 2
; SSE-NEXT: [[LD14:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 14), align 4
; SSE-NEXT: [[LD15:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 15), align 2
; SSE-NEXT: [[CVT0:%.*]] = sitofp i16 [[LD0]] to float
; SSE-NEXT: [[CVT1:%.*]] = sitofp i16 [[LD1]] to float
; SSE-NEXT: [[CVT2:%.*]] = sitofp i16 [[LD2]] to float
; SSE-NEXT: [[CVT3:%.*]] = sitofp i16 [[LD3]] to float
; SSE-NEXT: [[CVT4:%.*]] = sitofp i16 [[LD4]] to float
; SSE-NEXT: [[CVT5:%.*]] = sitofp i16 [[LD5]] to float
; SSE-NEXT: [[CVT6:%.*]] = sitofp i16 [[LD6]] to float
; SSE-NEXT: [[CVT7:%.*]] = sitofp i16 [[LD7]] to float
; SSE-NEXT: [[CVT8:%.*]] = sitofp i16 [[LD8]] to float
; SSE-NEXT: [[CVT9:%.*]] = sitofp i16 [[LD9]] to float
; SSE-NEXT: [[CVT10:%.*]] = sitofp i16 [[LD10]] to float
; SSE-NEXT: [[CVT11:%.*]] = sitofp i16 [[LD11]] to float
; SSE-NEXT: [[CVT12:%.*]] = sitofp i16 [[LD12]] to float
; SSE-NEXT: [[CVT13:%.*]] = sitofp i16 [[LD13]] to float
; SSE-NEXT: [[CVT14:%.*]] = sitofp i16 [[LD14]] to float
; SSE-NEXT: [[CVT15:%.*]] = sitofp i16 [[LD15]] to float
; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; SSE-NEXT: store float [[CVT2]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
; SSE-NEXT: store float [[CVT3]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
; SSE-NEXT: store float [[CVT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 16
; SSE-NEXT: store float [[CVT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
; SSE-NEXT: store float [[CVT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
; SSE-NEXT: store float [[CVT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
; SSE-NEXT: store float [[CVT8]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8), align 32
; SSE-NEXT: store float [[CVT9]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 9), align 4
; SSE-NEXT: store float [[CVT10]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 10), align 8
; SSE-NEXT: store float [[CVT11]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 11), align 4
; SSE-NEXT: store float [[CVT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 16
; SSE-NEXT: store float [[CVT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
; SSE-NEXT: store float [[CVT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 8
; SSE-NEXT: store float [[CVT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @sitofp_16i16_16f32(		; AVX256-LABEL: @sitofp_16i16_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16		; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.*]] = sitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP4:%.*]] = sitofp <8 x i16> [[TMP2]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/uitofp.ll

Show First 20 Lines • Show All 899 Lines • ▼ Show 20 Lines	;
store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64		store float %cvt0, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4		store float %cvt1, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8		store float %cvt2, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4		store float %cvt3, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @uitofp_8i16_8f32() #0 {		define void @uitofp_8i16_8f32() #0 {
; SSE-LABEL: @uitofp_8i16_8f32(		; CHECK-LABEL: @uitofp_8i16_8f32(
; SSE-NEXT: [[LD0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		; CHECK-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; SSE-NEXT: [[LD1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		; CHECK-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; SSE-NEXT: [[LD2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		; CHECK-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; SSE-NEXT: [[LD3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		; CHECK-NEXT: ret void
; SSE-NEXT: [[LD4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8
; SSE-NEXT: [[LD5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2
; SSE-NEXT: [[LD6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
; SSE-NEXT: [[LD7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 7), align 2
; SSE-NEXT: [[CVT0:%.*]] = uitofp i16 [[LD0]] to float
; SSE-NEXT: [[CVT1:%.*]] = uitofp i16 [[LD1]] to float
; SSE-NEXT: [[CVT2:%.*]] = uitofp i16 [[LD2]] to float
; SSE-NEXT: [[CVT3:%.*]] = uitofp i16 [[LD3]] to float
; SSE-NEXT: [[CVT4:%.*]] = uitofp i16 [[LD4]] to float
; SSE-NEXT: [[CVT5:%.*]] = uitofp i16 [[LD5]] to float
; SSE-NEXT: [[CVT6:%.*]] = uitofp i16 [[LD6]] to float
; SSE-NEXT: [[CVT7:%.*]] = uitofp i16 [[LD7]] to float
; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; SSE-NEXT: store float [[CVT2]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
; SSE-NEXT: store float [[CVT3]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
; SSE-NEXT: store float [[CVT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 16
; SSE-NEXT: store float [[CVT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
; SSE-NEXT: store float [[CVT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
; SSE-NEXT: store float [[CVT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
; SSE-NEXT: ret void
;
; AVX-LABEL: @uitofp_8i16_8f32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX-NEXT: [[TMP2:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX-NEXT: store <8 x float> [[TMP2]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; AVX-NEXT: ret void
;		;
%ld0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		%ld0 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64
%ld1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		%ld1 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2
%ld2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		%ld2 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4
%ld3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		%ld3 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2
%ld4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8		%ld4 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8
%ld5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2		%ld5 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2
%ld6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4		%ld6 = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
Show All 14 Lines	;
store float %cvt5, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4		store float %cvt5, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8		store float %cvt6, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4		store float %cvt7, float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
ret void		ret void
}		}

define void @uitofp_16i16_16f32() #0 {		define void @uitofp_16i16_16f32() #0 {
; SSE-LABEL: @uitofp_16i16_16f32(		; SSE-LABEL: @uitofp_16i16_16f32(
; SSE-NEXT: [[LD0:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 0), align 64		; SSE-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; SSE-NEXT: [[LD1:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 1), align 2		; SSE-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; SSE-NEXT: [[LD2:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 2), align 4		; SSE-NEXT: [[TMP3:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; SSE-NEXT: [[LD3:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 3), align 2		; SSE-NEXT: [[TMP4:%.*]] = uitofp <8 x i16> [[TMP2]] to <8 x float>
; SSE-NEXT: [[LD4:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 4), align 8		; SSE-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
; SSE-NEXT: [[LD5:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 5), align 2		; SSE-NEXT: store <8 x float> [[TMP4]], <8 x float>* bitcast (float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8) to <8 x float>*), align 32
; SSE-NEXT: [[LD6:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 6), align 4
; SSE-NEXT: [[LD7:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 7), align 2
; SSE-NEXT: [[LD8:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8), align 16
; SSE-NEXT: [[LD9:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 9), align 2
; SSE-NEXT: [[LD10:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 10), align 4
; SSE-NEXT: [[LD11:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 11), align 2
; SSE-NEXT: [[LD12:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 12), align 8
; SSE-NEXT: [[LD13:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 13), align 2
; SSE-NEXT: [[LD14:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 14), align 4
; SSE-NEXT: [[LD15:%.]] = load i16, i16 getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 15), align 2
; SSE-NEXT: [[CVT0:%.*]] = uitofp i16 [[LD0]] to float
; SSE-NEXT: [[CVT1:%.*]] = uitofp i16 [[LD1]] to float
; SSE-NEXT: [[CVT2:%.*]] = uitofp i16 [[LD2]] to float
; SSE-NEXT: [[CVT3:%.*]] = uitofp i16 [[LD3]] to float
; SSE-NEXT: [[CVT4:%.*]] = uitofp i16 [[LD4]] to float
; SSE-NEXT: [[CVT5:%.*]] = uitofp i16 [[LD5]] to float
; SSE-NEXT: [[CVT6:%.*]] = uitofp i16 [[LD6]] to float
; SSE-NEXT: [[CVT7:%.*]] = uitofp i16 [[LD7]] to float
; SSE-NEXT: [[CVT8:%.*]] = uitofp i16 [[LD8]] to float
; SSE-NEXT: [[CVT9:%.*]] = uitofp i16 [[LD9]] to float
; SSE-NEXT: [[CVT10:%.*]] = uitofp i16 [[LD10]] to float
; SSE-NEXT: [[CVT11:%.*]] = uitofp i16 [[LD11]] to float
; SSE-NEXT: [[CVT12:%.*]] = uitofp i16 [[LD12]] to float
; SSE-NEXT: [[CVT13:%.*]] = uitofp i16 [[LD13]] to float
; SSE-NEXT: [[CVT14:%.*]] = uitofp i16 [[LD14]] to float
; SSE-NEXT: [[CVT15:%.*]] = uitofp i16 [[LD15]] to float
; SSE-NEXT: store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
; SSE-NEXT: store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
; SSE-NEXT: store float [[CVT2]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 2), align 8
; SSE-NEXT: store float [[CVT3]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 3), align 4
; SSE-NEXT: store float [[CVT4]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 4), align 16
; SSE-NEXT: store float [[CVT5]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 5), align 4
; SSE-NEXT: store float [[CVT6]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 6), align 8
; SSE-NEXT: store float [[CVT7]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 7), align 4
; SSE-NEXT: store float [[CVT8]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 8), align 32
; SSE-NEXT: store float [[CVT9]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 9), align 4
; SSE-NEXT: store float [[CVT10]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 10), align 8
; SSE-NEXT: store float [[CVT11]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 11), align 4
; SSE-NEXT: store float [[CVT12]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 12), align 16
; SSE-NEXT: store float [[CVT13]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 13), align 4
; SSE-NEXT: store float [[CVT14]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 14), align 8
; SSE-NEXT: store float [[CVT15]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 15), align 4
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX256-LABEL: @uitofp_16i16_16f32(		; AVX256-LABEL: @uitofp_16i16_16f32(
; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64		; AVX256-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> bitcast ([32 x i16]* @src16 to <8 x i16>*), align 64
; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16		; AVX256-NEXT: [[TMP2:%.]] = load <8 x i16>, <8 x i16> bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @src16, i32 0, i64 8) to <8 x i16>*), align 16
; AVX256-NEXT: [[TMP3:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>		; AVX256-NEXT: [[TMP3:%.*]] = uitofp <8 x i16> [[TMP1]] to <8 x float>
; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i16> [[TMP2]] to <8 x float>		; AVX256-NEXT: [[TMP4:%.*]] = uitofp <8 x i16> [[TMP2]] to <8 x float>
; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64		; AVX256-NEXT: store <8 x float> [[TMP3]], <8 x float>* bitcast ([16 x float]* @dst32 to <8 x float>*), align 64
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines