This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CodeGenFunction.cpp
-
test/
-
CodeGen/
-
aarch64-neon-ldst-one.c
-
aarch64-neon-scalar-x-indexed-elem.c
-
aarch64-poly128.c
-
aarch64-poly64.c
-
regcall2.c
-
CodeGenCXX/
-
arm-generated-fn-attr.cpp
-
dllexport-ctor-closure-nested.cpp
-
dllexport-ctor-closure.cpp
-
dllexport.cpp
-
OpenMP/
-
amdgcn-attributes.cpp
-
irbuilder_safelen.cpp
-
irbuilder_safelen_order_concurrent.cpp
-
irbuilder_simd_aligned.cpp
-
irbuilder_simdlen.cpp
-
irbuilder_simdlen_safelen.cpp
-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetMachine.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
masked-interleaved-load-i16.ll
-
masked-interleaved-store-i16.ll
-
masked-intrinsic-cost-inseltpoison.ll
-
masked-intrinsic-cost.ll
-
powi.ll
-
CodeGen/X86/
-
X86/
-
avx512-calling-conv.ll
-
avx512bw-mask-op.ll
-
avx512fp16-subv-broadcast-fp16.ll
-
perm.avx512-false-deps.ll
-
pr47299.ll
-
pr48727.ll
-
vector-shuffle-avx512.ll
-
vector-trunc-usat.ll

Differential D139627

clang/X86: Don't emit "min-legal-vector-width"="0"
Needs RevisionPublic

Authored by arsenm on Dec 8 2022, 5:49 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
aaron.ballman
pengfei
serge-sans-paille
xbolva00
jdoerfert
nikic
FreddyYe

Summary

This should be NFC as far as clang end user experience is concerned,
but does change X86's interpretation of IR without the attribute
explicitly specified.

This was clutter that's always annoyed me. I have no idea what this
does and just don't want to see it anymore. This was previously
attempted in D97116, which was reverted. It didn't attempt to deal
with the X86 backend's backwards treatment of the unset attribute
case. I tried out the reported regressing sample, and end to end I get
the same result before and after.

I have no idea what this attribute means and it seems to be very x86
specific spaghetti spread all around. It's constantly getting
recomputed. The IR documentation was only recently added, and makes no
sense to me. I have no idea what "legal" means in this context. At
first glance this corresponds to clang's min_vector_width attribute,
with a slightly different name. There seems to be an aggressive amount
of reinterpetation of this value.

The adventure begins in the calling convention
lowering. X86_64ABIInfo::classifyRegCallStructType, which computes a
"MaxVectorWidth", updating it based on the C types on certain
arguments. That is then used to initialize
CGFunctionInfo::setMaxVectorWidth.

CodeGenFunction::StartFunction then initializes LargestVectorWidth
based on MinVectorWidthAttr. CodeGenFunction::EmitCall computes a max
vector width again based on IR types for call sites? Yet another
reduction over the IR argument and return type is (redundantly?)
performed before final emission of the attribute. This is then clamped
to the value apparently computed during the initial calling convention
lowering?

In the middle end there's some additional insanity with attribute
propagation. There are additional argument type reductions. Finally,
the actual use modifies the X86Subtarget construction. It default
assumed UINT_MAX if the attribute isn't set, contrary to 0 this value
apparently used everywhere else. The point the subtarget is
constructed isn't well defined, so the underlying function's
min-legal-vector-width may have changed later during these attribute
propagation passes. Most of this code also consistently doesn't handle
vectors of pointers correctly.

The documentation and commit message for D123284 make no sense to
me. It's implying it's an ABI attribute, but the fact that it changes
in so many places and that clang doesn't emit it on declarations
implies it can't be one.

This avoids it the most common case, but it's still emitted for any
function with a vector in the arguments or return type. It should
perhaps be moved into x86 specific attribute emission. It would be
reasonable to treat this as a generic attribute if it was a 1:1
mapping from the source level attribute, which merely passes through
to the backend. Is there a reason we can't just have this pass through
to let x86 do the IR argument type inspection later? To minimize test
churn, I added yet another reduction to handle unannotated functions
in the subtarget construction. I believe this should be removed and
the remaining tests updated.

The net effect of all this complexity seems to be pretty small, only
changing handling of very large vectors on subtargets with avx512. I
updated the tests by merely placing min-legal-vector-width=512
(although some of the test failures without this looked reasonable to
just update, although they looked like regressions).

Diff Detail

Event Timeline

arsenm created this revision.Dec 8 2022, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 5:49 AM

Herald added subscribers: kosarev, StephenFan, arphaman and 2 others. · View Herald Transcript

arsenm requested review of this revision.Dec 8 2022, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 5:49 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B201948: Diff 481256.Dec 8 2022, 5:50 AM

arsenm mentioned this in rG7c04454227f5: [ArgPromotion][Attributor] Update min-legal-vector-width when do promotion.Dec 8 2022, 5:51 AM

The use of min-legal-vector-width doesn't look great to me either. I'm more than glad if we can remove it totally without any user perceivable affects.
I cannot agree with this change because it neither eliminates the clutter (but makes it even worse [1]) nor is NFC to end user.
I think we used UINT32_MAX just to be compatible with BCs that generated before introduing the attribute. This change definitely breaks the compatibility.
Placing a "min-legal-vector-width" = "512" doesn't make any sense either. For one thing, we cannot place the attribute in pre-built BC files, for another 512 is the current max vector suppoted on X86, we cannot guarantee no 1024, 2048 etc. in future and we cannot change it too once compiled into BC files.

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations. But it is not clear especially we still set other values like 128, 256 etc.

This revision now requires changes to proceed.Dec 8 2022, 7:20 AM

The attribute is supposed to tell the backend if there were any vectors used in the C source code and how big they were. A value of 0 means the code didn't contain any vectors.

The backend assumes lack of the attributes means the IR didn't come from clang and can't be trusted to have been audited. That's why the backend uses UINT32_MAX.

If the attribute is present and less than 512, avx512 is enabled, and we're targeting certain CPUs with a avx512 frequency penalty, the backend type legalizer in X86 will not have 512 bit vectors as legal types. This will cause any vectors larger than 512 to be split. Such vectors were likely emitted by the loop or SLP vectorizer which isn't bound to the legal vector width.

If the C source used a 512 bit vector explicitly, either via target specific intrinsic, function argument or return, or inline assembly, etc. We need to have the backend treat the type as legal. If we don't do that, splitting the type could be incorrect. It could cause an ABI break across translation units. Or cause the backend legalizer to need to split something it isn't capable of splitting like a target specific intrinsic.

I admit the naming is unfortunate. I think I called the attribute min-legal-vector-width because it is the "minimum vector width the backend needs to consider legal", still subject to what subtarget features actually provide.

The clang code uses Max because it is calculated by maxing a bunch of things.

Isn't this (inherently) X86 specific?

In D139627#3982349, @jdoerfert wrote:

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

In D139627#3981475, @pengfei wrote:

The use of min-legal-vector-width doesn't look great to me either. I'm more than glad if we can remove it totally without any user perceivable affects.
I cannot agree with this change because it neither eliminates the clutter (but makes it even worse [1]) nor is NFC to end user.
I think we used UINT32_MAX just to be compatible with BCs that generated before introduing the attribute. This change definitely breaks the compatibility.

What I'm getting is this is only a performance hint, and definitively doesn't matter for ABI purposes. Bitcode backwards performance compatibility is not important. That would also be recovered by having a proper attribute propagation done as an optimization.

I think all of clang's handling of this should be purged, except for the part where it's passing through the user attribute.

Placing a "min-legal-vector-width" = "512" doesn't make any sense either. For one thing, we cannot place the attribute in pre-built BC files, for another 512 is the current max vector suppoted on X86, we cannot guarantee no 1024, 2048 etc. in future and we cannot change it too once compiled into BC files.

It's a test for specific behavior, with a specific configuration that exists today. It doesn't matter what this would be in the future for larger testcases

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

In D139627#3982385, @arsenm wrote:

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

In D139627#3982381, @craig.topper wrote:

In D139627#3982349, @jdoerfert wrote:

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

That would hide the annoyance from me, but I still think the implementation of this concept is poor and should be treated like other optimization hint attributes

In D139627#3982387, @arsenm wrote:

In D139627#3982385, @arsenm wrote:

[1] "min-legal-vector-width" = "0" was clear to indicate there are only scalar operations.

It's not remotely clear what this means

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

In D139627#3982381, @craig.topper wrote:

In D139627#3982349, @jdoerfert wrote:

Isn't this (inherently) X86 specific?

Yes it is. We could qualify the attribute emission with the targeting being X86?

That's a good idea. How about AArch64? I found there are uses there in the test cases.
Put a patch D139701 for AMDGPU only.

pengfei mentioned this in D139701: [Clang] Emit "min-legal-vector-width" attribute for X86 only.Dec 9 2022, 2:49 AM

In D139627#3983718, @pengfei wrote:

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

In D139627#3993440, @arsenm wrote:

In D139627#3983718, @pengfei wrote:

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

Those are also parameter attributes; this is a function attribute.

In D139627#3993440, @arsenm wrote:

In D139627#3983718, @pengfei wrote:

It also doesn't mean that, because the IR doesn't have to be consistent with the attribute. The IR exists independent of the attribute, and the attribute can only provide performance hints.

I don't agree. There are attributes like zeroext, byref are ABI related see https://llvm.org/docs/LangRef.html#parameter-attributes. I'd take min-legal-vector-width as a similar one.

This is not an ABI attribute, it's an optimization hint. If it were an ABI attribute, inferring and propagating it like is done would not be correct. If you treat it like an optimization hint, you only have to consider this one place instead of everywhere the IR may change

If the caller or callee calculate have different values and one of them is less than the width of the vector argument it will cause an ABI break. The type legalizer in SelectionDAG will split one and not the other. Maybe the backend should check the IR for the arguments/returns and increase the min-legal-vector-width if its less than argument width and the argument width is supported by the AVX level.

In D139627#3993471, @craig.topper wrote:

If the caller or callee calculate have different values and one of them is less than the width of the vector argument it will cause an ABI break. The type legalizer in SelectionDAG will split one and not the other. Maybe the backend should check the IR for the arguments/returns and increase the min-legal-vector-width if its less than argument width and the argument width is supported by the AVX level.

Right, I think the backend needs to fixup whatever it needs for hard requirements. The current implementation treating it like hard ABI any time a function signature changes isn't scalable. Every possible transform that could introduce a call site would need to handle this.

pengfei mentioned this in rGe746a9a600a0: [Clang] Emit "min-legal-vector-width" attribute for X86 only.Dec 20 2022, 7:58 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CodeGenFunction.cpp

3 lines

test/

CodeGen/

aarch64-neon-ldst-one.c

2 lines

aarch64-neon-scalar-x-indexed-elem.c

42 lines

aarch64-poly128.c

12 lines

aarch64-poly64.c

2 lines

regcall2.c

2 lines

CodeGenCXX/

arm-generated-fn-attr.cpp

12 lines

dllexport-ctor-closure-nested.cpp

2 lines

dllexport-ctor-closure.cpp

16 lines

dllexport.cpp

4 lines

OpenMP/

amdgcn-attributes.cpp

18 lines

irbuilder_safelen.cpp

4 lines

irbuilder_safelen_order_concurrent.cpp

4 lines

irbuilder_simd_aligned.cpp

4 lines

irbuilder_simdlen.cpp

4 lines

irbuilder_simdlen_safelen.cpp

4 lines

llvm/

lib/

Target/

X86/

X86TargetMachine.cpp

37 lines

test/

Analysis/

CostModel/

X86/

masked-interleaved-load-i16.ll

8 lines

masked-interleaved-store-i16.ll

8 lines

masked-intrinsic-cost-inseltpoison.ll

48 lines

masked-intrinsic-cost.ll

48 lines

powi.ll

12 lines

CodeGen/

X86/

avx512-calling-conv.ll

46 lines

avx512bw-mask-op.ll

38 lines

avx512fp16-subv-broadcast-fp16.ll

8 lines

perm.avx512-false-deps.ll

74 lines

pr47299.ll

14 lines

pr48727.ll

4 lines

vector-shuffle-avx512.ll

44 lines

vector-trunc-usat.ll

64 lines

Diff 481256

clang/lib/CodeGen/CodeGenFunction.cpp

Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	void CodeGenFunction::FinishFunction(SourceLocation EndLoc) {

// Add the required-vector-width attribute. This contains the max width from:		// Add the required-vector-width attribute. This contains the max width from:
// 1. min-vector-width attribute used in the source program.		// 1. min-vector-width attribute used in the source program.
// 2. Any builtins used that have a vector width specified.		// 2. Any builtins used that have a vector width specified.
// 3. Values passed in and out of inline assembly.		// 3. Values passed in and out of inline assembly.
// 4. Width of vector arguments and return types for this function.		// 4. Width of vector arguments and return types for this function.
// 5. Width of vector aguments and return types for functions called by this		// 5. Width of vector aguments and return types for functions called by this
// function.		// function.
		if (LargestVectorWidth != 0)
CurFn->addFnAttr("min-legal-vector-width", llvm::utostr(LargestVectorWidth));		CurFn->addFnAttr("min-legal-vector-width", llvm::utostr(LargestVectorWidth));

// Add vscale_range attribute if appropriate.		// Add vscale_range attribute if appropriate.
Optional<std::pair<unsigned, unsigned>> VScaleRange =		Optional<std::pair<unsigned, unsigned>> VScaleRange =
getContext().getTargetInfo().getVScaleRange(getLangOpts());		getContext().getTargetInfo().getVScaleRange(getLangOpts());
if (VScaleRange) {		if (VScaleRange) {
CurFn->addFnAttr(llvm::Attribute::getWithVScaleRangeArgs(		CurFn->addFnAttr(llvm::Attribute::getWithVScaleRangeArgs(
getLLVMContext(), VScaleRange->first, VScaleRange->second));		getLLVMContext(), VScaleRange->first, VScaleRange->second));
}		}
▲ Show 20 Lines • Show All 2,299 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-neon-ldst-one.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,796 Lines • ▼ Show 20 Lines
	// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])			// CHECK: call void @llvm.aarch64.neon.st4lane.v1i64.p0i8(<1 x i64> [[TMP11]], <1 x i64> [[TMP12]], <1 x i64> [[TMP13]], <1 x i64> [[TMP14]], i64 0, i8* [[TMP2]])
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_p64(poly64_t *a, poly64x1x4_t b) {			void test_vst4_lane_p64(poly64_t *a, poly64x1x4_t b) {
	vst4_lane_p64(a, b, 0);			vst4_lane_p64(a, b, 0);
	}			}

	// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="128"			// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="128"
	// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="64"			// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="64"
	// CHECK: attributes #2 ={{.*}}"min-legal-vector-width"="0"			// CHECK-NOT: "min-legal-vector-width"="0"

clang/test/CodeGen/aarch64-neon-scalar-x-indexed-elem.c

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	// CHECK: [[TMP2:%.*]] = bitcast <1 x double> %a to double			// CHECK: [[TMP2:%.*]] = bitcast <1 x double> %a to double
	// CHECK: [[TMP3:%.*]] = fmul double [[TMP2]], %b			// CHECK: [[TMP3:%.*]] = fmul double [[TMP2]], %b
	// CHECK: [[TMP4:%.*]] = bitcast double [[TMP3]] to <1 x double>			// CHECK: [[TMP4:%.*]] = bitcast double [[TMP3]] to <1 x double>
	// CHECK: ret <1 x double> [[TMP4]]			// CHECK: ret <1 x double> [[TMP4]]
	float64x1_t test_vmul_n_f64(float64x1_t a, float64_t b) {			float64x1_t test_vmul_n_f64(float64x1_t a, float64_t b) {
	return vmul_n_f64(a, b);			return vmul_n_f64(a, b);
	}			}

	// CHECK-LABEL: define{{.*}} float @test_vmulxs_lane_f32(float noundef %a, <2 x float> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} float @test_vmulxs_lane_f32(float noundef %a, <2 x float> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x float> %b, i32 1			// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x float> %b, i32 1
	// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGET_LANE]])			// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGET_LANE]])
	// CHECK: ret float [[VMULXS_F32_I]]			// CHECK: ret float [[VMULXS_F32_I]]
	float32_t test_vmulxs_lane_f32(float32_t a, float32x2_t b) {			float32_t test_vmulxs_lane_f32(float32_t a, float32x2_t b) {
	return vmulxs_lane_f32(a, b, 1);			return vmulxs_lane_f32(a, b, 1);
	}			}

	// CHECK-LABEL: define{{.*}} float @test_vmulxs_laneq_f32(float noundef %a, <4 x float> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} float @test_vmulxs_laneq_f32(float noundef %a, <4 x float> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> %b, i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x float> %b, i32 3
	// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGETQ_LANE]])			// CHECK: [[VMULXS_F32_I:%.*]] = call float @llvm.aarch64.neon.fmulx.f32(float %a, float [[VGETQ_LANE]])
	// CHECK: ret float [[VMULXS_F32_I]]			// CHECK: ret float [[VMULXS_F32_I]]
	float32_t test_vmulxs_laneq_f32(float32_t a, float32x4_t b) {			float32_t test_vmulxs_laneq_f32(float32_t a, float32x4_t b) {
	return vmulxs_laneq_f32(a, b, 3);			return vmulxs_laneq_f32(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} double @test_vmulxd_lane_f64(double noundef %a, <1 x double> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} double @test_vmulxd_lane_f64(double noundef %a, <1 x double> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %b, i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %b, i32 0
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGET_LANE]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGET_LANE]])
	// CHECK: ret double [[VMULXD_F64_I]]			// CHECK: ret double [[VMULXD_F64_I]]
	float64_t test_vmulxd_lane_f64(float64_t a, float64x1_t b) {			float64_t test_vmulxd_lane_f64(float64_t a, float64x1_t b) {
	return vmulxd_lane_f64(a, b, 0);			return vmulxd_lane_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define{{.*}} double @test_vmulxd_laneq_f64(double noundef %a, <2 x double> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} double @test_vmulxd_laneq_f64(double noundef %a, <2 x double> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 1
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGETQ_LANE]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double %a, double [[VGETQ_LANE]])
	// CHECK: ret double [[VMULXD_F64_I]]			// CHECK: ret double [[VMULXD_F64_I]]
	float64_t test_vmulxd_laneq_f64(float64_t a, float64x2_t b) {			float64_t test_vmulxd_laneq_f64(float64_t a, float64x2_t b) {
	return vmulxd_laneq_f64(a, b, 1);			return vmulxd_laneq_f64(a, b, 1);
	}			}

	// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_lane_f64(<1 x double> noundef %a, <1 x double> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_lane_f64(<1 x double> noundef %a, <1 x double> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0
	// CHECK: [[VGET_LANE6:%.*]] = extractelement <1 x double> %b, i32 0			// CHECK: [[VGET_LANE6:%.*]] = extractelement <1 x double> %b, i32 0
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGET_LANE6]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGET_LANE6]])
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0
	// CHECK: ret <1 x double> [[VSET_LANE]]			// CHECK: ret <1 x double> [[VSET_LANE]]
	float64x1_t test_vmulx_lane_f64(float64x1_t a, float64x1_t b) {			float64x1_t test_vmulx_lane_f64(float64x1_t a, float64x1_t b) {
	return vmulx_lane_f64(a, b, 0);			return vmulx_lane_f64(a, b, 0);
	}			}


	// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_0(<1 x double> noundef %a, <2 x double> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_0(<1 x double> noundef %a, <2 x double> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 0			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 0
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0
	// CHECK: ret <1 x double> [[VSET_LANE]]			// CHECK: ret <1 x double> [[VSET_LANE]]
	float64x1_t test_vmulx_laneq_f64_0(float64x1_t a, float64x2_t b) {			float64x1_t test_vmulx_laneq_f64_0(float64x1_t a, float64x2_t b) {
	return vmulx_laneq_f64(a, b, 0);			return vmulx_laneq_f64(a, b, 0);
	}			}

	// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_1(<1 x double> noundef %a, <2 x double> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_1(<1 x double> noundef %a, <2 x double> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> %a, i32 0
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> %b, i32 1
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> %a, double [[VMULXD_F64_I]], i32 0
	// CHECK: ret <1 x double> [[VSET_LANE]]			// CHECK: ret <1 x double> [[VSET_LANE]]
	float64x1_t test_vmulx_laneq_f64_1(float64x1_t a, float64x2_t b) {			float64x1_t test_vmulx_laneq_f64_1(float64x1_t a, float64x2_t b) {
	return vmulx_laneq_f64(a, b, 1);			return vmulx_laneq_f64(a, b, 1);
	}			}
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0			// CHECK: [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	// CHECK: [[TMP6:%.*]] = call double @llvm.fma.f64(double [[TMP4]], double [[EXTRACT]], double [[TMP3]])			// CHECK: [[TMP6:%.*]] = call double @llvm.fma.f64(double [[TMP4]], double [[EXTRACT]], double [[TMP3]])
	// CHECK: [[TMP7:%.*]] = bitcast double [[TMP6]] to <1 x double>			// CHECK: [[TMP7:%.*]] = bitcast double [[TMP6]] to <1 x double>
	// CHECK: ret <1 x double> [[TMP7]]			// CHECK: ret <1 x double> [[TMP7]]
	float64x1_t test_vfms_laneq_f64(float64x1_t a, float64x1_t b, float64x2_t v) {			float64x1_t test_vfms_laneq_f64(float64x1_t a, float64x1_t b, float64x2_t v) {
	return vfms_laneq_f64(a, b, v, 0);			return vfms_laneq_f64(a, b, v, 0);
	}			}

	// CHECK-LABEL: define{{.*}} i32 @test_vqdmullh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i32 @test_vqdmullh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3			// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0
	// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0
	// CHECK: ret i32 [[TMP4]]			// CHECK: ret i32 [[TMP4]]
	int32_t test_vqdmullh_lane_s16(int16_t a, int16x4_t b) {			int32_t test_vqdmullh_lane_s16(int16_t a, int16x4_t b) {
	return vqdmullh_lane_s16(a, b, 3);			return vqdmullh_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i64 @test_vqdmulls_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i64 @test_vqdmulls_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1			// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1
	// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGET_LANE]])			// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGET_LANE]])
	// CHECK: ret i64 [[VQDMULLS_S32_I]]			// CHECK: ret i64 [[VQDMULLS_S32_I]]
	int64_t test_vqdmulls_lane_s32(int32_t a, int32x2_t b) {			int64_t test_vqdmulls_lane_s32(int32_t a, int32x2_t b) {
	return vqdmulls_lane_s32(a, b, 1);			return vqdmulls_lane_s32(a, b, 1);
	}			}

	// CHECK-LABEL: define{{.*}} i32 @test_vqdmullh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i32 @test_vqdmullh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
	// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQDMULLH_S16_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i32> [[VQDMULLH_S16_I]], i64 0
	// CHECK: ret i32 [[TMP4]]			// CHECK: ret i32 [[TMP4]]
	int32_t test_vqdmullh_laneq_s16(int16_t a, int16x8_t b) {			int32_t test_vqdmullh_laneq_s16(int16_t a, int16x8_t b) {
	return vqdmullh_laneq_s16(a, b, 7);			return vqdmullh_laneq_s16(a, b, 7);
	}			}

	// CHECK-LABEL: define{{.*}} i64 @test_vqdmulls_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i64 @test_vqdmulls_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3
	// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGETQ_LANE]])			// CHECK: [[VQDMULLS_S32_I:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %a, i32 [[VGETQ_LANE]])
	// CHECK: ret i64 [[VQDMULLS_S32_I]]			// CHECK: ret i64 [[VQDMULLS_S32_I]]
	int64_t test_vqdmulls_laneq_s32(int32_t a, int32x4_t b) {			int64_t test_vqdmulls_laneq_s32(int32_t a, int32x4_t b) {
	return vqdmulls_laneq_s32(a, b, 3);			return vqdmulls_laneq_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i16 @test_vqdmulhh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i16 @test_vqdmulhh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3			// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0
	// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0
	// CHECK: ret i16 [[TMP4]]			// CHECK: ret i16 [[TMP4]]
	int16_t test_vqdmulhh_lane_s16(int16_t a, int16x4_t b) {			int16_t test_vqdmulhh_lane_s16(int16_t a, int16x4_t b) {
	return vqdmulhh_lane_s16(a, b, 3);			return vqdmulhh_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i32 @test_vqdmulhs_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i32 @test_vqdmulhs_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1			// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1
	// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGET_LANE]])			// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGET_LANE]])
	// CHECK: ret i32 [[VQDMULHS_S32_I]]			// CHECK: ret i32 [[VQDMULHS_S32_I]]
	int32_t test_vqdmulhs_lane_s32(int32_t a, int32x2_t b) {			int32_t test_vqdmulhs_lane_s32(int32_t a, int32x2_t b) {
	return vqdmulhs_lane_s32(a, b, 1);			return vqdmulhs_lane_s32(a, b, 1);
	}			}


	// CHECK-LABEL: define{{.*}} i16 @test_vqdmulhh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i16 @test_vqdmulhh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
	// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQDMULHH_S16_I]], i64 0
	// CHECK: ret i16 [[TMP4]]			// CHECK: ret i16 [[TMP4]]
	int16_t test_vqdmulhh_laneq_s16(int16_t a, int16x8_t b) {			int16_t test_vqdmulhh_laneq_s16(int16_t a, int16x8_t b) {
	return vqdmulhh_laneq_s16(a, b, 7);			return vqdmulhh_laneq_s16(a, b, 7);
	}			}


	// CHECK-LABEL: define{{.*}} i32 @test_vqdmulhs_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i32 @test_vqdmulhs_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3
	// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])			// CHECK: [[VQDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])
	// CHECK: ret i32 [[VQDMULHS_S32_I]]			// CHECK: ret i32 [[VQDMULHS_S32_I]]
	int32_t test_vqdmulhs_laneq_s32(int32_t a, int32x4_t b) {			int32_t test_vqdmulhs_laneq_s32(int32_t a, int32x4_t b) {
	return vqdmulhs_laneq_s32(a, b, 3);			return vqdmulhs_laneq_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i16 @test_vqrdmulhh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i16 @test_vqrdmulhh_lane_s16(i16 noundef %a, <4 x i16> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3			// CHECK: [[VGET_LANE:%.*]] = extractelement <4 x i16> %b, i32 3
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGET_LANE]], i64 0
	// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0
	// CHECK: ret i16 [[TMP4]]			// CHECK: ret i16 [[TMP4]]
	int16_t test_vqrdmulhh_lane_s16(int16_t a, int16x4_t b) {			int16_t test_vqrdmulhh_lane_s16(int16_t a, int16x4_t b) {
	return vqrdmulhh_lane_s16(a, b, 3);			return vqrdmulhh_lane_s16(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i32 @test_vqrdmulhs_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #0 {			// CHECK-LABEL: define{{.*}} i32 @test_vqrdmulhs_lane_s32(i32 noundef %a, <2 x i32> noundef %b) #2 {
	// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1			// CHECK: [[VGET_LANE:%.*]] = extractelement <2 x i32> %b, i32 1
	// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGET_LANE]])			// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGET_LANE]])
	// CHECK: ret i32 [[VQRDMULHS_S32_I]]			// CHECK: ret i32 [[VQRDMULHS_S32_I]]
	int32_t test_vqrdmulhs_lane_s32(int32_t a, int32x2_t b) {			int32_t test_vqrdmulhs_lane_s32(int32_t a, int32x2_t b) {
	return vqrdmulhs_lane_s32(a, b, 1);			return vqrdmulhs_lane_s32(a, b, 1);
	}			}


	// CHECK-LABEL: define{{.*}} i16 @test_vqrdmulhh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i16 @test_vqrdmulhh_laneq_s16(i16 noundef %a, <8 x i16> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <8 x i16> %b, i32 7
	// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0			// CHECK: [[TMP2:%.*]] = insertelement <4 x i16> undef, i16 %a, i64 0
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[VGETQ_LANE]], i64 0
	// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])			// CHECK: [[VQRDMULHH_S16_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[TMP2]], <4 x i16> [[TMP3]])
	// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0			// CHECK: [[TMP4:%.*]] = extractelement <4 x i16> [[VQRDMULHH_S16_I]], i64 0
	// CHECK: ret i16 [[TMP4]]			// CHECK: ret i16 [[TMP4]]
	int16_t test_vqrdmulhh_laneq_s16(int16_t a, int16x8_t b) {			int16_t test_vqrdmulhh_laneq_s16(int16_t a, int16x8_t b) {
	return vqrdmulhh_laneq_s16(a, b, 7);			return vqrdmulhh_laneq_s16(a, b, 7);
	}			}


	// CHECK-LABEL: define{{.*}} i32 @test_vqrdmulhs_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #1 {			// CHECK-LABEL: define{{.*}} i32 @test_vqrdmulhs_laneq_s32(i32 noundef %a, <4 x i32> noundef %b) #2 {
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <4 x i32> %b, i32 3
	// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])			// CHECK: [[VQRDMULHS_S32_I:%.*]] = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %a, i32 [[VGETQ_LANE]])
	// CHECK: ret i32 [[VQRDMULHS_S32_I]]			// CHECK: ret i32 [[VQRDMULHS_S32_I]]
	int32_t test_vqrdmulhs_laneq_s32(int32_t a, int32x4_t b) {			int32_t test_vqrdmulhs_laneq_s32(int32_t a, int32x4_t b) {
	return vqrdmulhs_laneq_s32(a, b, 3);			return vqrdmulhs_laneq_s32(a, b, 3);
	}			}

	// CHECK-LABEL: define{{.*}} i32 @test_vqdmlalh_lane_s16(i32 noundef %a, i16 noundef %b, <4 x i16> noundef %c) #0 {			// CHECK-LABEL: define{{.*}} i32 @test_vqdmlalh_lane_s16(i32 noundef %a, i16 noundef %b, <4 x i16> noundef %c) #0 {
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	// CHECK: [[LANE:%.*]] = extractelement <4 x i32> %c, i32 3			// CHECK: [[LANE:%.*]] = extractelement <4 x i32> %c, i32 3
	// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])			// CHECK: [[VQDMLXL:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %b, i32 [[LANE]])
	// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])			// CHECK: [[VQDMLXL1:%.*]] = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %a, i64 [[VQDMLXL]])
	// CHECK: ret i64 [[VQDMLXL1]]			// CHECK: ret i64 [[VQDMLXL1]]
	int64_t test_vqdmlsls_laneq_s32(int64_t a, int32_t b, int32x4_t c) {			int64_t test_vqdmlsls_laneq_s32(int64_t a, int32_t b, int32x4_t c) {
	return vqdmlsls_laneq_s32(a, b, c, 3);			return vqdmlsls_laneq_s32(a, b, c, 3);
	}			}

	// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_lane_f64_0() #0 {			// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_lane_f64_0() #2 {
	// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>			// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>
	// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>			// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP0]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP0]], i32 0
	// CHECK: [[VGET_LANE7:%.*]] = extractelement <1 x double> [[TMP1]], i32 0			// CHECK: [[VGET_LANE7:%.*]] = extractelement <1 x double> [[TMP1]], i32 0
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGET_LANE7]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGET_LANE7]])
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP0]], double [[VMULXD_F64_I]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP0]], double [[VMULXD_F64_I]], i32 0
	// CHECK: ret <1 x double> [[VSET_LANE]]			// CHECK: ret <1 x double> [[VSET_LANE]]
	float64x1_t test_vmulx_lane_f64_0() {			float64x1_t test_vmulx_lane_f64_0() {
	float64x1_t arg1;			float64x1_t arg1;
	float64x1_t arg2;			float64x1_t arg2;
	float64x1_t result;			float64x1_t result;
	float64_t sarg1, sarg2, sres;			float64_t sarg1, sarg2, sres;
	arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));			arg1 = vcreate_f64(UINT64_C(0x3fd6304bc43ab5c2));
	arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));			arg2 = vcreate_f64(UINT64_C(0x3fee211e215aeef3));
	result = vmulx_lane_f64(arg1, arg2, 0);			result = vmulx_lane_f64(arg1, arg2, 0);
	return result;			return result;
	}			}

	// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_2() #1 {			// CHECK-LABEL: define{{.*}} <1 x double> @test_vmulx_laneq_f64_2() #2 {
	// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>			// CHECK: [[TMP0:%.*]] = bitcast i64 4599917171378402754 to <1 x double>
	// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>			// CHECK: [[TMP1:%.*]] = bitcast i64 4606655882138939123 to <1 x double>
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x double> [[TMP0]], <1 x double> [[TMP1]], <2 x i32> <i32 0, i32 1>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <1 x double> [[TMP0]], <1 x double> [[TMP1]], <2 x i32> <i32 0, i32 1>
	// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP0]], i32 0			// CHECK: [[VGET_LANE:%.*]] = extractelement <1 x double> [[TMP0]], i32 0
	// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[SHUFFLE_I]], i32 1			// CHECK: [[VGETQ_LANE:%.*]] = extractelement <2 x double> [[SHUFFLE_I]], i32 1
	// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])			// CHECK: [[VMULXD_F64_I:%.*]] = call double @llvm.aarch64.neon.fmulx.f64(double [[VGET_LANE]], double [[VGETQ_LANE]])
	// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP0]], double [[VMULXD_F64_I]], i32 0			// CHECK: [[VSET_LANE:%.*]] = insertelement <1 x double> [[TMP0]], double [[VMULXD_F64_I]], i32 0
	// CHECK: ret <1 x double> [[VSET_LANE]]			// CHECK: ret <1 x double> [[VSET_LANE]]
	Show All 15 Lines

clang/test/CodeGen/aarch64-poly128.c

	Show All 22 Lines
	void test_vstrq_p128(poly128_t * ptr, poly128_t val) {			void test_vstrq_p128(poly128_t * ptr, poly128_t val) {
	vstrq_p128(ptr, val);			vstrq_p128(ptr, val);

	}			}

	// CHECK-LABEL: define {{[^@]+}}@test_vldrq_p128			// CHECK-LABEL: define {{[^@]+}}@test_vldrq_p128
	// CHECK-SAME: (ptr noundef [[PTR:%.*]]) #[[ATTR0]] {			// CHECK-SAME: (ptr noundef [[PTR:%.*]]) #[[ATTR0]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[PTR]], align 16			// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr [[PTR]], align 16
	// CHECK-NEXT: ret i128 [[TMP2]]			// CHECK-NEXT: ret i128 [[TMP0]]
	//			//
	poly128_t test_vldrq_p128(poly128_t * ptr) {			poly128_t test_vldrq_p128(poly128_t * ptr) {
	return vldrq_p128(ptr);			return vldrq_p128(ptr);

	}			}

	// CHECK-LABEL: define {{[^@]+}}@test_ld_st_p128			// CHECK-LABEL: define {{[^@]+}}@test_ld_st_p128
	// CHECK-SAME: (ptr noundef [[PTR:%.*]]) #[[ATTR0]] {			// CHECK-SAME: (ptr noundef [[PTR:%.*]]) #[[ATTR0]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[PTR]], align 16			// CHECK-NEXT: [[TMP0:%.*]] = load i128, ptr [[PTR]], align 16
	// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i128, ptr [[PTR]], i64 1			// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i128, ptr [[PTR]], i64 1
	// CHECK-NEXT: store i128 [[TMP2]], ptr [[ADD_PTR]], align 16			// CHECK-NEXT: store i128 [[TMP0]], ptr [[ADD_PTR]], align 16
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	void test_ld_st_p128(poly128_t * ptr) {			void test_ld_st_p128(poly128_t * ptr) {
	vstrq_p128(ptr+1, vldrq_p128(ptr));			vstrq_p128(ptr+1, vldrq_p128(ptr));

	}			}

	// CHECK-LABEL: define {{[^@]+}}@test_vmull_p64			// CHECK-LABEL: define {{[^@]+}}@test_vmull_p64
	// CHECK-SAME: (i64 noundef [[A:%.]], i64 noundef [[B:%.]]) #[[ATTR0]] {			// CHECK-SAME: (i64 noundef [[A:%.]], i64 noundef [[B:%.]]) #[[ATTR0]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[VMULL_P64_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.pmull64(i64 [[A]], i64 [[B]])			// CHECK-NEXT: [[VMULL_P64_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.pmull64(i64 [[A]], i64 [[B]])
	// CHECK-NEXT: [[VMULL_P641_I:%.*]] = bitcast <16 x i8> [[VMULL_P64_I]] to i128			// CHECK-NEXT: [[VMULL_P641_I:%.*]] = bitcast <16 x i8> [[VMULL_P64_I]] to i128
	// CHECK-NEXT: ret i128 [[VMULL_P641_I]]			// CHECK-NEXT: ret i128 [[VMULL_P641_I]]
	//			//
	poly128_t test_vmull_p64(poly64_t a, poly64_t b) {			poly128_t test_vmull_p64(poly64_t a, poly64_t b) {
	return vmull_p64(a, b);			return vmull_p64(a, b);
	}			}

	// CHECK-LABEL: define {{[^@]+}}@test_vmull_high_p64			// CHECK-LABEL: define {{[^@]+}}@test_vmull_high_p64
	// CHECK-SAME: (<2 x i64> noundef [[A:%.]], <2 x i64> noundef [[B:%.]]) #[[ATTR1:[0-9]+]] {			// CHECK-SAME: (<2 x i64> noundef [[A:%.]], <2 x i64> noundef [[B:%.]]) #[[ATTR0]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE_I5:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[A]], <1 x i32> <i32 1>			// CHECK-NEXT: [[SHUFFLE_I5:%.*]] = shufflevector <2 x i64> [[A]], <2 x i64> [[A]], <1 x i32> <i32 1>
	// CHECK-NEXT: [[TMP0:%.*]] = bitcast <1 x i64> [[SHUFFLE_I5]] to i64			// CHECK-NEXT: [[TMP0:%.*]] = bitcast <1 x i64> [[SHUFFLE_I5]] to i64
	// CHECK-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[B]], <2 x i64> [[B]], <1 x i32> <i32 1>			// CHECK-NEXT: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[B]], <2 x i64> [[B]], <1 x i32> <i32 1>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to i64			// CHECK-NEXT: [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to i64
	// CHECK-NEXT: [[VMULL_P64_I_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.pmull64(i64 [[TMP0]], i64 [[TMP1]])			// CHECK-NEXT: [[VMULL_P64_I_I:%.*]] = call <16 x i8> @llvm.aarch64.neon.pmull64(i64 [[TMP0]], i64 [[TMP1]])
	// CHECK-NEXT: [[VMULL_P641_I_I:%.*]] = bitcast <16 x i8> [[VMULL_P64_I_I]] to i128			// CHECK-NEXT: [[VMULL_P641_I_I:%.*]] = bitcast <16 x i8> [[VMULL_P64_I_I]] to i128
	// CHECK-NEXT: ret i128 [[VMULL_P641_I_I]]			// CHECK-NEXT: ret i128 [[VMULL_P641_I_I]]
	//			//
	poly128_t test_vmull_high_p64(poly64x2_t a, poly64x2_t b) {			poly128_t test_vmull_high_p64(poly64x2_t a, poly64x2_t b) {
	return vmull_high_p64(a, b);			return vmull_high_p64(a, b);
	}			}

	// CHECK-LABEL: define {{[^@]+}}@test_vreinterpretq_p128_s8			// CHECK-LABEL: define {{[^@]+}}@test_vreinterpretq_p128_s8
	// CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR1]] {			// CHECK-SAME: (<16 x i8> noundef [[A:%.*]]) #[[ATTR1:[0-9]+]] {
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.*]] = bitcast <16 x i8> [[A]] to i128			// CHECK-NEXT: [[TMP0:%.*]] = bitcast <16 x i8> [[A]] to i128
	// CHECK-NEXT: ret i128 [[TMP0]]			// CHECK-NEXT: ret i128 [[TMP0]]
	//			//
	poly128_t test_vreinterpretq_p128_s8(int8x16_t a) {			poly128_t test_vreinterpretq_p128_s8(int8x16_t a) {
	return vreinterpretq_p128_s8(a);			return vreinterpretq_p128_s8(a);
	}			}

	▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-poly64.c

	Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines
	// CHECK: [[VSRI_N2:%.*]] = call <2 x i64> @llvm.aarch64.neon.vsri.v2i64(<2 x i64> [[VSRI_N]], <2 x i64> [[VSRI_N1]], i32 64)			// CHECK: [[VSRI_N2:%.*]] = call <2 x i64> @llvm.aarch64.neon.vsri.v2i64(<2 x i64> [[VSRI_N]], <2 x i64> [[VSRI_N1]], i32 64)
	// CHECK: ret <2 x i64> [[VSRI_N2]]			// CHECK: ret <2 x i64> [[VSRI_N2]]
	poly64x2_t test_vsriq_n_p64(poly64x2_t a, poly64x2_t b) {			poly64x2_t test_vsriq_n_p64(poly64x2_t a, poly64x2_t b) {
	return vsriq_n_p64(a, b, 64);			return vsriq_n_p64(a, b, 64);
	}			}

	// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"			// CHECK: attributes #0 ={{.*}}"min-legal-vector-width"="64"
	// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"			// CHECK: attributes #1 ={{.*}}"min-legal-vector-width"="128"
	// CHECK: attributes #2 ={{.*}}"min-legal-vector-width"="0"			// CHECK-NOT: "min-legal-vector-width"="0"

clang/test/CodeGen/regcall2.c

	Show All 15 Lines

	double __regcall bar(__sVector a) {			double __regcall bar(__sVector a) {
	return a.r1[0][4];			return a.r1[0][4];
	}			}

	// FIXME: Do we need to change for Windows?			// FIXME: Do we need to change for Windows?
	// Win: define dso_local x86_regcallcc void @__regcall3__foo(ptr noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0			// Win: define dso_local x86_regcallcc void @__regcall3__foo(ptr noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
	// Win: define dso_local x86_regcallcc double @__regcall3__bar(ptr noundef %a) #0			// Win: define dso_local x86_regcallcc double @__regcall3__bar(ptr noundef %a) #0
	// Win: attributes #0 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }			// Win: attributes #0 = { noinline nounwind optnone "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }

	// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0			// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
	// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0			// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
	// Lin: attributes #0 = { noinline nounwind optnone "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }			// Lin: attributes #0 = { noinline nounwind optnone "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }

clang/test/CodeGenCXX/arm-generated-fn-attr.cpp

Show All 17 Lines	int testfn() noexcept {
// Calling fn in a noexcept function causes __clang_call_terminate to be generated		// Calling fn in a noexcept function causes __clang_call_terminate to be generated
fn();		fn();
// Use of var1 and var2 causes TLS wrapper functions to be generated		// Use of var1 and var2 causes TLS wrapper functions to be generated
return var1.fn() + var2.fn();		return var1.fn() + var2.fn();
}		}

// CHECK: define {{.*}} @__cxx_global_var_init() [[ATTR1:#[0-9]+]]		// CHECK: define {{.*}} @__cxx_global_var_init() [[ATTR1:#[0-9]+]]
// CHECK: define {{.}} @__clang_call_terminate({{.}}) [[ATTR2:#[0-9]+]]		// CHECK: define {{.}} @__clang_call_terminate({{.}}) [[ATTR2:#[0-9]+]]
// CHECK: define {{.*}} @_ZTW4var1() [[ATTR3:#[0-9]+]]		// CHECK: define {{.*}} @_ZTW4var1() [[ATTR1]]
// CHECK: define {{.*}} @_ZTW4var2() [[ATTR3]]		// CHECK: define {{.*}} @_ZTW4var2() [[ATTR1]]
// CHECK: define {{.*}} @__tls_init() [[ATTR1]]		// CHECK: define {{.*}} @__tls_init() [[ATTR1]]

// CHECK-PACBTI: attributes [[ATTR1]] = { {{.}}"target-features"="+armv8.1-m.main,+pacbti,+thumb-mode"{{.}} }
// CHECK-PACBTI: attributes [[ATTR2]] = { {{.}}"target-features"="+armv8.1-m.main,+pacbti,+thumb-mode"{{.}} }		// CHECK-PACBTI: attributes [[ATTR1]] = { {{.*}}"target-features"="+armv8.1-m.main,+pacbti,+thumb-mode" }
// CHECK-PACBTI: attributes [[ATTR3]] = { {{.}}"target-features"="+armv8.1-m.main,+pacbti,+thumb-mode"{{.}} }		// CHECK-PACBTI: attributes [[ATTR2]] = { {{.*}}"target-features"="+armv8.1-m.main,+pacbti,+thumb-mode" }


// CHECK-NOPACBTI: attributes [[ATTR1]] = { {{.}}"target-features"="+armv8.1-m.main,+thumb-mode,-pacbti"{{.}} }		// CHECK-NOPACBTI: attributes [[ATTR1]] = { {{.}}"target-features"="+armv8.1-m.main,+thumb-mode,-pacbti"{{.}} }
// CHECK-NOPACBTI: attributes [[ATTR2]] = { {{.}}"target-features"="+armv8.1-m.main,+thumb-mode,-pacbti"{{.}} }		// CHECK-NOPACBTI: attributes [[ATTR2]] = { {{.}}"target-features"="+armv8.1-m.main,+thumb-mode,-pacbti"{{.}} }
// CHECK-NOPACBTI: attributes [[ATTR3]] = { {{.}}"target-features"="+armv8.1-m.main,+thumb-mode,-pacbti"{{.}} }

clang/test/CodeGenCXX/dllexport-ctor-closure-nested.cpp

	Show All 11 Lines
	};			};
	struct __declspec(dllexport) CtorClosureOuter {			struct __declspec(dllexport) CtorClosureOuter {
	struct __declspec(dllexport) CtorClosureInner {			struct __declspec(dllexport) CtorClosureInner {
	CtorClosureInner(const HasImplicitDtor1 &v = {}) {}			CtorClosureInner(const HasImplicitDtor1 &v = {}) {}
	};			};
	};			};

	// CHECK-LABEL: $"??1HasImplicitDtor1@@QAE@XZ" = comdat any			// CHECK-LABEL: $"??1HasImplicitDtor1@@QAE@XZ" = comdat any
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorClosureInner@CtorClosureOuter@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorClosureInner@CtorClosureOuter@@QAEXXZ"({{.*}}) comdat

clang/test/CodeGenCXX/dllexport-ctor-closure.cpp

	// RUN: %clang_cc1 -triple i686-windows-msvc -emit-llvm -std=c++14 \			// RUN: %clang_cc1 -triple i686-windows-msvc -emit-llvm -std=c++14 \
	// RUN: -fno-threadsafe-statics -fms-extensions -O1 -mconstructor-aliases \			// RUN: -fno-threadsafe-statics -fms-extensions -O1 -mconstructor-aliases \
	// RUN: -disable-llvm-passes -o - %s -w -fms-compatibility-version=19.00 \| \			// RUN: -disable-llvm-passes -o - %s -w -fms-compatibility-version=19.00 \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	struct CtorWithClosure {			struct CtorWithClosure {
	__declspec(dllexport) CtorWithClosure(...) {}			__declspec(dllexport) CtorWithClosure(...) {}
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorWithClosure@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorWithClosure@@QAEXXZ"({{.*}}) comdat
	// CHECK: %[[this_addr:.*]] = alloca ptr, align 4			// CHECK: %[[this_addr:.*]] = alloca ptr, align 4
	// CHECK: store ptr %this, ptr %[[this_addr]], align 4			// CHECK: store ptr %this, ptr %[[this_addr]], align 4
	// CHECK: %[[this:.*]] = load ptr, ptr %[[this_addr]]			// CHECK: %[[this:.*]] = load ptr, ptr %[[this_addr]]
	// CHECK: call noundef ptr (ptr, ...) @"??0CtorWithClosure@@QAA@ZZ"(ptr {{[^,]*}} %[[this]])			// CHECK: call noundef ptr (ptr, ...) @"??0CtorWithClosure@@QAA@ZZ"(ptr {{[^,]*}} %[[this]])
	// CHECK: ret void			// CHECK: ret void
	};			};

	struct CtorWithClosureOutOfLine {			struct CtorWithClosureOutOfLine {
	__declspec(dllexport) CtorWithClosureOutOfLine(...);			__declspec(dllexport) CtorWithClosureOutOfLine(...);
	};			};
	CtorWithClosureOutOfLine::CtorWithClosureOutOfLine(...) {}			CtorWithClosureOutOfLine::CtorWithClosureOutOfLine(...) {}
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorWithClosureOutOfLine@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FCtorWithClosureOutOfLine@@QAEXXZ"({{.*}}) comdat

	#define DELETE_IMPLICIT_MEMBERS(ClassName) \			#define DELETE_IMPLICIT_MEMBERS(ClassName) \
	ClassName(ClassName &&) = delete; \			ClassName(ClassName &&) = delete; \
	ClassName(ClassName &) = delete; \			ClassName(ClassName &) = delete; \
	~ClassName() = delete; \			~ClassName() = delete; \
	ClassName &operator=(ClassName &) = delete			ClassName &operator=(ClassName &) = delete

	struct __declspec(dllexport) ClassWithClosure {			struct __declspec(dllexport) ClassWithClosure {
	DELETE_IMPLICIT_MEMBERS(ClassWithClosure);			DELETE_IMPLICIT_MEMBERS(ClassWithClosure);
	ClassWithClosure(...) {}			ClassWithClosure(...) {}
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FClassWithClosure@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FClassWithClosure@@QAEXXZ"({{.*}}) comdat
	// CHECK: %[[this_addr:.*]] = alloca ptr, align 4			// CHECK: %[[this_addr:.*]] = alloca ptr, align 4
	// CHECK: store ptr %this, ptr %[[this_addr]], align 4			// CHECK: store ptr %this, ptr %[[this_addr]], align 4
	// CHECK: %[[this:.*]] = load ptr, ptr %[[this_addr]]			// CHECK: %[[this:.*]] = load ptr, ptr %[[this_addr]]
	// CHECK: call noundef ptr (ptr, ...) @"??0ClassWithClosure@@QAA@ZZ"(ptr {{[^,]*}} %[[this]])			// CHECK: call noundef ptr (ptr, ...) @"??0ClassWithClosure@@QAA@ZZ"(ptr {{[^,]*}} %[[this]])
	// CHECK: ret void			// CHECK: ret void
	};			};

	template <typename T> struct TemplateWithClosure {			template <typename T> struct TemplateWithClosure {
	TemplateWithClosure(int x = sizeof(T)) {}			TemplateWithClosure(int x = sizeof(T)) {}
	};			};
	extern template struct TemplateWithClosure<char>;			extern template struct TemplateWithClosure<char>;
	template struct __declspec(dllexport) TemplateWithClosure<char>;			template struct __declspec(dllexport) TemplateWithClosure<char>;
	extern template struct TemplateWithClosure<int>;			extern template struct TemplateWithClosure<int>;
	template struct __declspec(dllexport) TemplateWithClosure<int>;			template struct __declspec(dllexport) TemplateWithClosure<int>;

	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$TemplateWithClosure@D@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$TemplateWithClosure@D@@QAEXXZ"({{.*}}) comdat
	// CHECK: call {{.}} @"??0?$TemplateWithClosure@D@@QAE@H@Z"({{.}}, i32 noundef 1)			// CHECK: call {{.}} @"??0?$TemplateWithClosure@D@@QAE@H@Z"({{.}}, i32 noundef 1)

	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$TemplateWithClosure@H@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$TemplateWithClosure@H@@QAEXXZ"({{.*}}) comdat
	// CHECK: call {{.}} @"??0?$TemplateWithClosure@H@@QAE@H@Z"({{.}}, i32 noundef 4)			// CHECK: call {{.}} @"??0?$TemplateWithClosure@H@@QAE@H@Z"({{.}}, i32 noundef 4)

	template <typename T> struct __declspec(dllexport) ExportedTemplateWithClosure {			template <typename T> struct __declspec(dllexport) ExportedTemplateWithClosure {
	ExportedTemplateWithClosure(int x = sizeof(T)) {}			ExportedTemplateWithClosure(int x = sizeof(T)) {}
	};			};
	template <> ExportedTemplateWithClosure<int>::ExportedTemplateWithClosure(int); // Don't try to emit the closure for a declaration.			template <> ExportedTemplateWithClosure<int>::ExportedTemplateWithClosure(int); // Don't try to emit the closure for a declaration.
	template <> ExportedTemplateWithClosure<int>::ExportedTemplateWithClosure(int) {};			template <> ExportedTemplateWithClosure<int>::ExportedTemplateWithClosure(int) {};
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$ExportedTemplateWithClosure@H@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$ExportedTemplateWithClosure@H@@QAEXXZ"({{.*}}) comdat
	// CHECK: call {{.}} @"??0?$ExportedTemplateWithClosure@H@@QAE@H@Z"({{.}}, i32 noundef 4)			// CHECK: call {{.}} @"??0?$ExportedTemplateWithClosure@H@@QAE@H@Z"({{.}}, i32 noundef 4)

	struct __declspec(dllexport) NestedOuter {			struct __declspec(dllexport) NestedOuter {
	DELETE_IMPLICIT_MEMBERS(NestedOuter);			DELETE_IMPLICIT_MEMBERS(NestedOuter);
	NestedOuter(void *p = 0) {}			NestedOuter(void *p = 0) {}
	struct __declspec(dllexport) NestedInner {			struct __declspec(dllexport) NestedInner {
	DELETE_IMPLICIT_MEMBERS(NestedInner);			DELETE_IMPLICIT_MEMBERS(NestedInner);
	NestedInner(void *p = 0) {}			NestedInner(void *p = 0) {}
	};			};
	};			};

	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FNestedOuter@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FNestedOuter@@QAEXXZ"({{.*}}) comdat
	// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FNestedInner@NestedOuter@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// CHECK-LABEL: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FNestedInner@NestedOuter@@QAEXXZ"({{.*}}) comdat

	struct HasDtor {			struct HasDtor {
	~HasDtor();			~HasDtor();
	int o;			int o;
	};			};
	struct HasImplicitDtor1 { HasDtor o; };			struct HasImplicitDtor1 { HasDtor o; };
	struct HasImplicitDtor2 { HasDtor o; };			struct HasImplicitDtor2 { HasDtor o; };
	struct __declspec(dllexport) CtorClosureInline {			struct __declspec(dllexport) CtorClosureInline {
	Show All 18 Lines

clang/test/CodeGenCXX/dllexport.cpp

	Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines
	struct SomeTemplate {			struct SomeTemplate {
	SomeTemplate(T o = T()) : o(o) {}			SomeTemplate(T o = T()) : o(o) {}
	T o;			T o;
	};			};
	// MSVC2015-DAG: define weak_odr dso_local dllexport {{.+}} @"??4?$SomeTemplate@H@@Q{{.+}}@$$Q{{.+}}@@Z"			// MSVC2015-DAG: define weak_odr dso_local dllexport {{.+}} @"??4?$SomeTemplate@H@@Q{{.+}}@$$Q{{.+}}@@Z"
	// MSVC2013-DAG: define weak_odr dso_local dllexport {{.+}} @"??4?$SomeTemplate@H@@Q{{.+}}0@A{{.+}}0@@Z"			// MSVC2013-DAG: define weak_odr dso_local dllexport {{.+}} @"??4?$SomeTemplate@H@@Q{{.+}}0@A{{.+}}0@@Z"
	struct __declspec(dllexport) InheritFromTemplate : SomeTemplate<int> {};			struct __declspec(dllexport) InheritFromTemplate : SomeTemplate<int> {};

	// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$SomeTemplate@H@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc void @"??_F?$SomeTemplate@H@@QAEXXZ"({{.*}}) comdat

	namespace PR23801 {			namespace PR23801 {
	template <typename>			template <typename>
	struct S {			struct S {
	~S() {}			~S() {}
	};			};
	struct A {			struct A {
	A(int);			A(int);
	S<int> s;			S<int> s;
	};			};
	struct __declspec(dllexport) B {			struct __declspec(dllexport) B {
	B(A = 0) {}			B(A = 0) {}
	};			};

	}			}
	//			//
	// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FB@PR23801@@QAEXXZ"({{.*}}) {{#[0-9]+}} comdat			// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc void @"??_FB@PR23801@@QAEXXZ"({{.*}}) comdat

	struct __declspec(dllexport) T {			struct __declspec(dllexport) T {
	// Copy assignment operator:			// Copy assignment operator:
	// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) ptr @"??4T@@QAEAAU0@ABU0@@Z"			// M32-DAG: define weak_odr dso_local dllexport x86_thiscallcc nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) ptr @"??4T@@QAEAAU0@ABU0@@Z"

	// Explicitly defaulted copy constructur:			// Explicitly defaulted copy constructur:
	T(const T&) = default;			T(const T&) = default;
	// M32MSVC2013-DAG: define weak_odr dso_local dllexport x86_thiscallcc ptr @"??0T@@QAE@ABU0@@Z"			// M32MSVC2013-DAG: define weak_odr dso_local dllexport x86_thiscallcc ptr @"??0T@@QAE@ABU0@@Z"
	▲ Show 20 Lines • Show All 532 Lines • Show Last 20 Lines

clang/test/OpenMP/amdgcn-attributes.cpp

Show All 26 Lines	#pragma omp target
return arr[0];		return arr[0];
}		}

int callable(int x) {		int callable(int x) {
// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1		// ALL-LABEL: @_Z8callablei(i32 noundef %x) #1
return x + 1;		return x + 1;
}		}

// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" "uniform-work-group-size"="true" }		// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" "uniform-work-group-size"="true" }
// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "kernel" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "kernel" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }		// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "kernel" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }

// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }		// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// NOIEEE: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-ieee"="false" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }		// UNSAFEATOMIC: attributes #1 = { convergent mustprogress noinline nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }

clang/test/OpenMP/irbuilder_safelen.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	#pragma omp simd safelen(3)
}		}

#pragma omp simd		#pragma omp simd
for (int j = 3; j < 32; j += 5) {		for (int j = 3; j < 32; j += 5) {
c[j] = pp.a;		c[j] = pp.a;
}		}
}		}
//.		//.
// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #1 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.		//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}		// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{i32 7, !"openmp", i32 45}		// CHECK: !1 = !{i32 7, !"openmp", i32 45}
// CHECK: !3 = distinct !{!3, !4, !5}		// CHECK: !3 = distinct !{!3, !4, !5}
// CHECK: !4 = !{!"llvm.loop.vectorize.enable", i1 true}		// CHECK: !4 = !{!"llvm.loop.vectorize.enable", i1 true}
// CHECK: !5 = !{!"llvm.loop.vectorize.width", i32 3}		// CHECK: !5 = !{!"llvm.loop.vectorize.width", i32 3}
// CHECK: !6 = distinct !{}		// CHECK: !6 = distinct !{}
// CHECK: !7 = distinct !{!7, !8, !4}		// CHECK: !7 = distinct !{!7, !8, !4}
// CHECK: !8 = !{!"llvm.loop.parallel_accesses", !6}		// CHECK: !8 = !{!"llvm.loop.parallel_accesses", !6}
//.		//.

clang/test/OpenMP/irbuilder_safelen_order_concurrent.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	#pragma omp simd safelen(3) order(concurrent)
}		}

#pragma omp simd		#pragma omp simd
for (int j = 3; j < 32; j += 5) {		for (int j = 3; j < 32; j += 5) {
c[j] = pp.a;		c[j] = pp.a;
}		}
}		}
//.		//.
// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #1 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.		//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}		// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{i32 7, !"openmp", i32 50}		// CHECK: !1 = !{i32 7, !"openmp", i32 50}
// CHECK: !3 = distinct !{}		// CHECK: !3 = distinct !{}
// CHECK: !4 = distinct !{!4, !5, !6, !7}		// CHECK: !4 = distinct !{!4, !5, !6, !7}
// CHECK: !5 = !{!"llvm.loop.parallel_accesses", !3}		// CHECK: !5 = !{!"llvm.loop.parallel_accesses", !3}
// CHECK: !6 = !{!"llvm.loop.vectorize.enable", i1 true}		// CHECK: !6 = !{!"llvm.loop.vectorize.enable", i1 true}
// CHECK: !7 = !{!"llvm.loop.vectorize.width", i32 3}		// CHECK: !7 = !{!"llvm.loop.vectorize.width", i32 3}
// CHECK: !8 = distinct !{}		// CHECK: !8 = distinct !{}
// CHECK: !9 = distinct !{!9, !10, !6}		// CHECK: !9 = distinct !{!9, !10, !6}
// CHECK: !10 = !{!"llvm.loop.parallel_accesses", !8}		// CHECK: !10 = !{!"llvm.loop.parallel_accesses", !8}
//.		//.

clang/test/OpenMP/irbuilder_simd_aligned.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	#pragma omp simd aligned (a:128) aligned(p:64) aligned(D)
}		}

#pragma omp simd		#pragma omp simd
for (int j = 3; j < N; j += 5) {		for (int j = 3; j < N; j += 5) {
c[j] = pp.a;		c[j] = pp.a;
}		}
}		}
//.		//.
// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #1 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) }		// CHECK: attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) }
//.		//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}		// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{i32 7, !"openmp", i32 50}		// CHECK: !1 = !{i32 7, !"openmp", i32 50}
// CHECK: !3 = distinct !{!3, !4}		// CHECK: !3 = distinct !{!3, !4}
// CHECK: !4 = !{!"llvm.loop.mustprogress"}		// CHECK: !4 = !{!"llvm.loop.mustprogress"}
// CHECK: !5 = distinct !{}		// CHECK: !5 = distinct !{}
// CHECK: !6 = distinct !{!6, !7, !8}		// CHECK: !6 = distinct !{!6, !7, !8}
// CHECK: !7 = !{!"llvm.loop.parallel_accesses", !5}		// CHECK: !7 = !{!"llvm.loop.parallel_accesses", !5}
// CHECK: !8 = !{!"llvm.loop.vectorize.enable", i1 true}		// CHECK: !8 = !{!"llvm.loop.vectorize.enable", i1 true}
// CHECK: !9 = distinct !{}		// CHECK: !9 = distinct !{}
// CHECK: !10 = distinct !{!10, !11, !8}		// CHECK: !10 = distinct !{!10, !11, !8}
// CHECK: !11 = !{!"llvm.loop.parallel_accesses", !9}		// CHECK: !11 = !{!"llvm.loop.parallel_accesses", !9}
//.		//.

clang/test/OpenMP/irbuilder_simdlen.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	#pragma omp simd simdlen(3)
}		}

#pragma omp simd		#pragma omp simd
for (int j = 3; j < 32; j += 5) {		for (int j = 3; j < 32; j += 5) {
c[j] = pp.a;		c[j] = pp.a;
}		}
}		}
//.		//.
// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #1 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.		//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}		// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{i32 7, !"openmp", i32 45}		// CHECK: !1 = !{i32 7, !"openmp", i32 45}
// CHECK: !3 = distinct !{}		// CHECK: !3 = distinct !{}
// CHECK: !4 = distinct !{!4, !5, !6, !7}		// CHECK: !4 = distinct !{!4, !5, !6, !7}
// CHECK: !5 = !{!"llvm.loop.parallel_accesses", !3}		// CHECK: !5 = !{!"llvm.loop.parallel_accesses", !3}
// CHECK: !6 = !{!"llvm.loop.vectorize.enable", i1 true}		// CHECK: !6 = !{!"llvm.loop.vectorize.enable", i1 true}
// CHECK: !7 = !{!"llvm.loop.vectorize.width", i32 3}		// CHECK: !7 = !{!"llvm.loop.vectorize.width", i32 3}
// CHECK: !8 = distinct !{}		// CHECK: !8 = distinct !{}
// CHECK: !9 = distinct !{!9, !10, !6}		// CHECK: !9 = distinct !{!9, !10, !6}
// CHECK: !10 = !{!"llvm.loop.parallel_accesses", !8}		// CHECK: !10 = !{!"llvm.loop.parallel_accesses", !8}
//.		//.

clang/test/OpenMP/irbuilder_simdlen_safelen.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	#pragma omp simd safelen(3) simdlen(2)
}		}

#pragma omp simd		#pragma omp simd
for (int j = 3; j < 32; j += 5) {		for (int j = 3; j < 32; j += 5) {
c[j] = pp.a;		c[j] = pp.a;
}		}
}		}
//.		//.
// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #0 = { mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }		// CHECK: attributes #1 = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.		//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}		// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{i32 7, !"openmp", i32 45}		// CHECK: !1 = !{i32 7, !"openmp", i32 45}
// CHECK: !3 = distinct !{!3, !4, !5}		// CHECK: !3 = distinct !{!3, !4, !5}
// CHECK: !4 = !{!"llvm.loop.vectorize.enable", i1 true}		// CHECK: !4 = !{!"llvm.loop.vectorize.enable", i1 true}
// CHECK: !5 = !{!"llvm.loop.vectorize.width", i32 2}		// CHECK: !5 = !{!"llvm.loop.vectorize.width", i32 2}
// CHECK: !6 = distinct !{}		// CHECK: !6 = distinct !{}
// CHECK: !7 = distinct !{!7, !8, !4}		// CHECK: !7 = distinct !{!7, !8, !4}
// CHECK: !8 = !{!"llvm.loop.parallel_accesses", !6}		// CHECK: !8 = !{!"llvm.loop.parallel_accesses", !6}
//.		//.

llvm/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	if (PreferVecWidthAttr.isValid()) {
if (!Val.getAsInteger(0, Width)) {		if (!Val.getAsInteger(0, Width)) {
Key += 'p';		Key += 'p';
Key += Val;		Key += Val;
PreferVectorWidthOverride = Width;		PreferVectorWidthOverride = Width;
}		}
}		}

// Extract min-legal-vector-width attribute.		// Extract min-legal-vector-width attribute.
unsigned RequiredVectorWidth = UINT32_MAX;		unsigned RequiredVectorWidth = 0;
Attribute MinLegalVecWidthAttr = F.getFnAttribute("min-legal-vector-width");		Attribute MinLegalVecWidthAttr = F.getFnAttribute("min-legal-vector-width");

		// FIXME: The point the subtarget is constructed is not well defined. The
		// attribute propagation passes may modify the attribute later, so you may get
		// a different subtarget at different points in the pipeline.
if (MinLegalVecWidthAttr.isValid()) {		if (MinLegalVecWidthAttr.isValid()) {
StringRef Val = MinLegalVecWidthAttr.getValueAsString();		StringRef Val = MinLegalVecWidthAttr.getValueAsString();
unsigned Width;		unsigned Width;
if (!Val.getAsInteger(0, Width)) {		if (!Val.getAsInteger(0, Width)) {
Key += 'm';		Key += 'm';
Key += Val;		Key += Val;
RequiredVectorWidth = Width;		RequiredVectorWidth = Width;
}		}
		} else {
		// FIXME: This reduction over vector return/argument vector size is
		// effectively repeated in at least 5 places. If there should be an
		// interaction between the argument types and the explicit
		// min-legal-vector-width, it should be consistently applied in one location
		// which is not the x86 subtarget constructor. A more reasonable approach
		// would be to have attribute inference account for this, and treat an
		// unannotated function consistently as 0 (but this requires test updates)
		// (also if inference needs to account for explicit user attributes and the
		// IR types, a separate x86 prefixed attribute would be better).
		unsigned LargestVectorWidth = 0;
		for (const Argument &A : F.args()) {
		if (auto *VT = dyn_cast<VectorType>(A.getType())) {
		LargestVectorWidth =
		std::max((uint64_t)LargestVectorWidth,
		VT->getPrimitiveSizeInBits().getKnownMinSize());
		}
		}

		// Update vector width based on return type.
		if (auto *VT = dyn_cast<VectorType>(F.getReturnType())) {
		LargestVectorWidth =
		std::max((uint64_t)LargestVectorWidth,
		VT->getPrimitiveSizeInBits().getKnownMinSize());
		}

		if (LargestVectorWidth != 0) {
		RequiredVectorWidth = LargestVectorWidth;
		Key += 'm';
		Key += llvm::utostr(RequiredVectorWidth);
		}
}		}

// Add CPU to the Key.		// Add CPU to the Key.
Key += CPU;		Key += CPU;

// Add tune CPU to the Key.		// Add tune CPU to the Key.
Key += TuneCPU;		Key += TuneCPU;

▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll

	Show All 12 Lines
	; the store is scalarized.			; the store is scalarized.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<1024;i++){			; for(i=0;i<1024;i++){
	; x[i] = points[i*4];			; x[i] = points[i*4];
	; y[i] = points[i*4 + 1];			; y[i] = points[i*4 + 1];
	; }			; }
	; (relates to the testcase in PR50566)			; (relates to the testcase in PR50566)

	define void @test1(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test1(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test1'			; DISABLED_MASKED_STRIDED-LABEL: 'test1'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	Show All 40 Lines
	; fold-tail mask. If using masked memops to vectorize interleaved-group with gaps is			; fold-tail mask. If using masked memops to vectorize interleaved-group with gaps is
	; not allowed, the store is scalarized and predicated.			; not allowed, the store is scalarized and predicated.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<numPoints;i++){			; for(i=0;i<numPoints;i++){
	; x[i] = points[i*4];			; x[i] = points[i*4];
	; y[i] = points[i*4 + 1];			; y[i] = points[i*4 + 1];
	; }			; }

	define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test2'			; DISABLED_MASKED_STRIDED-LABEL: 'test2'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; not allowed, the store is scalarized and predicated.			; not allowed, the store is scalarized and predicated.
	; Here the Interleave-group is with factor 3, storing only 1 member out of the 3.			; Here the Interleave-group is with factor 3, storing only 1 member out of the 3.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<1024;i++){			; for(i=0;i<1024;i++){
	; if (x[i] > 0)			; if (x[i] > 0)
	; x[i] = points[i*3];			; x[i] = points[i*3];
	; }			; }

	define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) {			define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test'			; DISABLED_MASKED_STRIDED-LABEL: 'test'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx, align 2
	Show All 33 Lines
	for.inc:			for.inc:
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond.not = icmp eq i64 %indvars.iv.next, 1024			%exitcond.not = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

				attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll

	Show All 12 Lines
	; the store is scalarized.			; the store is scalarized.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<1024;i++){			; for(i=0;i<1024;i++){
	; points[i*4] = x[i];			; points[i*4] = x[i];
	; points[i*4 + 1] = y[i];			; points[i*4 + 1] = y[i];
	; }			; }
	; (relates to the testcase in PR50566)			; (relates to the testcase in PR50566)

	define void @test1(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test1(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test1'			; DISABLED_MASKED_STRIDED-LABEL: 'test1'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 6 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2
	Show All 40 Lines
	; fold-tail mask. If using masked memops to vectorize interleaved-group with gaps is			; fold-tail mask. If using masked memops to vectorize interleaved-group with gaps is
	; not allowed, the store is scalarized and predicated.			; not allowed, the store is scalarized and predicated.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<numPoints;i++){			; for(i=0;i<numPoints;i++){
	; points[i*4] = x[i];			; points[i*4] = x[i];
	; points[i*4 + 1] = y[i];			; points[i*4 + 1] = y[i];
	; }			; }

	define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test2'			; DISABLED_MASKED_STRIDED-LABEL: 'test2'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 21 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 21 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; not allowed, the store is scalarized and predicated.			; not allowed, the store is scalarized and predicated.
	; Here the Interleave-group is with factor 3, storing only 1 member out of the 3.			; Here the Interleave-group is with factor 3, storing only 1 member out of the 3.
	; The input IR was generated from this source:			; The input IR was generated from this source:
	; for(i=0;i<1024;i++){			; for(i=0;i<1024;i++){
	; if (x[i] > 0)			; if (x[i] > 0)
	; points[i*3] = x[i];			; points[i*3] = x[i];
	; }			; }

	define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) {			define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) #0 {
	; DISABLED_MASKED_STRIDED-LABEL: 'test'			; DISABLED_MASKED_STRIDED-LABEL: 'test'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 16 for VF 16 For instruction: store i16 %0, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 16 for VF 16 For instruction: store i16 %0, i16* %arrayidx6, align 2
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: 'test'			; ENABLED_MASKED_STRIDED-LABEL: 'test'
	Show All 22 Lines
	for.inc:			for.inc:
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond.not = icmp eq i64 %indvars.iv.next, 1024			%exitcond.not = icmp eq i64 %indvars.iv.next, 1024
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

				attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE2		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE2
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse4.2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE42		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse4.2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE42
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX1		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX2		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX2
;		;
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skylake -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,SKL		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skylake -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,SKL
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=knl -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,KNL		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=knl -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,KNL
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,SKX		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,SKX

define i32 @masked_load() {		define i32 @masked_load() #0 {
; SSE2-LABEL: 'masked_load'		; SSE2-LABEL: 'masked_load'
; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V7F64 = call <7 x double> @llvm.masked.load.v7f64.p0v7f64(<7 x double>* undef, i32 1, <7 x i1> undef, <7 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V7F64 = call <7 x double> @llvm.masked.load.v7f64.p0v7f64(<7 x double>* undef, i32 1, <7 x i1> undef, <7 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V6F64 = call <6 x double> @llvm.masked.load.v6f64.p0v6f64(<6 x double>* undef, i32 1, <6 x i1> undef, <6 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V6F64 = call <6 x double> @llvm.masked.load.v6f64.p0v6f64(<6 x double>* undef, i32 1, <6 x i1> undef, <6 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V5F64 = call <5 x double> @llvm.masked.load.v5f64.p0v5f64(<5 x double>* undef, i32 1, <5 x i1> undef, <5 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V5F64 = call <5 x double> @llvm.masked.load.v5f64.p0v5f64(<5 x double>* undef, i32 1, <5 x i1> undef, <5 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V4F64 = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* undef, i32 1, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V4F64 = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* undef, i32 1, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V3F64 = call <3 x double> @llvm.masked.load.v3f64.p0v3f64(<3 x double>* undef, i32 1, <3 x i1> undef, <3 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V3F64 = call <3 x double> @llvm.masked.load.v3f64.p0v3f64(<3 x double>* undef, i32 1, <3 x i1> undef, <3 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* undef, i32 1, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* undef, i32 1, <2 x i1> undef, <2 x double> undef)
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_store() {		define i32 @masked_store() #0 {
; SSE2-LABEL: 'masked_store'		; SSE2-LABEL: 'masked_store'
; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.store.v7f64.p0v7f64(<7 x double> undef, <7 x double>* undef, i32 1, <7 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.store.v7f64.p0v7f64(<7 x double> undef, <7 x double>* undef, i32 1, <7 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: call void @llvm.masked.store.v6f64.p0v6f64(<6 x double> undef, <6 x double>* undef, i32 1, <6 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: call void @llvm.masked.store.v6f64.p0v6f64(<6 x double> undef, <6 x double>* undef, i32 1, <6 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.store.v5f64.p0v5f64(<5 x double> undef, <5 x double>* undef, i32 1, <5 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.store.v5f64.p0v5f64(<5 x double> undef, <5 x double>* undef, i32 1, <5 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> undef, <4 x double>* undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> undef, <4 x double>* undef, i32 1, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v3f64.p0v3f64(<3 x double> undef, <3 x double>* undef, i32 1, <3 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v3f64.p0v3f64(<3 x double> undef, <3 x double>* undef, i32 1, <3 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f64.p0v2f64(<2 x double> undef, <2 x double>* undef, i32 1, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f64.p0v2f64(<2 x double> undef, <2 x double>* undef, i32 1, <2 x i1> undef)
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)		call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)
call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)		call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)
call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)		call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)
call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)		call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_gather() {		define i32 @masked_gather() #0 {
; SSE2-LABEL: 'masked_gather'		; SSE2-LABEL: 'masked_gather'
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> undef, i32 1, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> undef, i32 1, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> undef, i32 1, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> undef, i32 1, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> undef, i32 1, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> undef, i32 1, <2 x i1> undef, <2 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0f64(<1 x double*> undef, i32 1, <1 x i1> undef, <1 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0f64(<1 x double*> undef, i32 1, <1 x i1> undef, <1 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 1, <16 x i1> undef, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 1, <16 x i1> undef, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float*> undef, i32 1, <8 x i1> undef, <8 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float*> undef, i32 1, <8 x i1> undef, <8 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> undef, i32 1, <4 x i1> undef, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> undef, i32 1, <4 x i1> undef, <4 x float> undef)
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0i8(<64 x i8*> undef, i32 1, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0i8(<64 x i8*> undef, i32 1, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0i8(<32 x i8*> undef, i32 1, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0i8(<32 x i8*> undef, i32 1, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0i8(<16 x i8*> undef, i32 1, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0i8(<16 x i8*> undef, i32 1, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0i8(<8 x i8*> undef, i32 1, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0i8(<8 x i8*> undef, i32 1, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_scatter() {		define i32 @masked_scatter() #0 {
; SSE2-LABEL: 'masked_scatter'		; SSE2-LABEL: 'masked_scatter'
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> undef, <8 x double*> undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> undef, <8 x double*> undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: call void @llvm.masked.scatter.v4f64.v4p0f64(<4 x double> undef, <4 x double*> undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: call void @llvm.masked.scatter.v4f64.v4p0f64(<4 x double> undef, <4 x double*> undef, i32 1, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> undef, <2 x double*> undef, i32 1, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> undef, <2 x double*> undef, i32 1, <2 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1f64.v1p0f64(<1 x double> undef, <1 x double*> undef, i32 1, <1 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1f64.v1p0f64(<1 x double> undef, <1 x double*> undef, i32 1, <1 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> undef, <16 x float*> undef, i32 1, <16 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> undef, <16 x float*> undef, i32 1, <16 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.scatter.v8f32.v8p0f32(<8 x float> undef, <8 x float*> undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.scatter.v8f32.v8p0f32(<8 x float> undef, <8 x float*> undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.scatter.v4f32.v4p0f32(<4 x float> undef, <4 x float*> undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.scatter.v4f32.v4p0f32(<4 x float> undef, <4 x float*> undef, i32 1, <4 x i1> undef)
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.scatter.v64i8.v64p0i8(<64 x i8> undef, <64 x i8*> undef, i32 1, <64 x i1> undef)		call void @llvm.masked.scatter.v64i8.v64p0i8(<64 x i8> undef, <64 x i8*> undef, i32 1, <64 x i1> undef)
call void @llvm.masked.scatter.v32i8.v32p0i8(<32 x i8> undef, <32 x i8*> undef, i32 1, <32 x i1> undef)		call void @llvm.masked.scatter.v32i8.v32p0i8(<32 x i8> undef, <32 x i8*> undef, i32 1, <32 x i1> undef)
call void @llvm.masked.scatter.v16i8.v16p0i8(<16 x i8> undef, <16 x i8*> undef, i32 1, <16 x i1> undef)		call void @llvm.masked.scatter.v16i8.v16p0i8(<16 x i8> undef, <16 x i8*> undef, i32 1, <16 x i1> undef)
call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> undef, <8 x i8*> undef, i32 1, <8 x i1> undef)		call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> undef, <8 x i8*> undef, i32 1, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_expandload() {		define i32 @masked_expandload() #0 {
; SSE2-LABEL: 'masked_expandload'		; SSE2-LABEL: 'masked_expandload'
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F64 = call <8 x double> @llvm.masked.expandload.v8f64(double* undef, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F64 = call <8 x double> @llvm.masked.expandload.v8f64(double* undef, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F64 = call <4 x double> @llvm.masked.expandload.v4f64(double* undef, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F64 = call <4 x double> @llvm.masked.expandload.v4f64(double* undef, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.masked.expandload.v2f64(double* undef, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.masked.expandload.v2f64(double* undef, <2 x i1> undef, <2 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.expandload.v1f64(double* undef, <1 x i1> undef, <1 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.expandload.v1f64(double* undef, <1 x i1> undef, <1 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.masked.expandload.v16f32(float* undef, <16 x i1> undef, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.masked.expandload.v16f32(float* undef, <16 x i1> undef, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.masked.expandload.v8f32(float* undef, <8 x i1> undef, <8 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.masked.expandload.v8f32(float* undef, <8 x i1> undef, <8 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.masked.expandload.v4f32(float* undef, <4 x i1> undef, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.masked.expandload.v4f32(float* undef, <4 x i1> undef, <4 x float> undef)
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.expandload.v64i8(i8* undef, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.expandload.v64i8(i8* undef, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.expandload.v32i8(i8* undef, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.expandload.v32i8(i8* undef, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.expandload.v16i8(i8* undef, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.expandload.v16i8(i8* undef, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.expandload.v8i8(i8* undef, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.expandload.v8i8(i8* undef, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_compressstore() {		define i32 @masked_compressstore() #0 {
; SSE2-LABEL: 'masked_compressstore'		; SSE2-LABEL: 'masked_compressstore'
; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @llvm.masked.compressstore.v8f64(<8 x double> undef, double* undef, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @llvm.masked.compressstore.v8f64(<8 x double> undef, double* undef, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.compressstore.v4f64(<4 x double> undef, double* undef, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.compressstore.v4f64(<4 x double> undef, double* undef, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.compressstore.v2f64(<2 x double> undef, double* undef, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.compressstore.v2f64(<2 x double> undef, double* undef, <2 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.compressstore.v1f64(<1 x double> undef, double* undef, <1 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.compressstore.v1f64(<1 x double> undef, double* undef, <1 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.compressstore.v16f32(<16 x float> undef, float* undef, <16 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.compressstore.v16f32(<16 x float> undef, float* undef, <16 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.compressstore.v8f32(<8 x float> undef, float* undef, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.compressstore.v8f32(<8 x float> undef, float* undef, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.compressstore.v4f32(<4 x float> undef, float* undef, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.compressstore.v4f32(<4 x float> undef, float* undef, <4 x i1> undef)
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.compressstore.v64i8(<64 x i8> undef, i8* undef, <64 x i1> undef)		call void @llvm.masked.compressstore.v64i8(<64 x i8> undef, i8* undef, <64 x i1> undef)
call void @llvm.masked.compressstore.v32i8(<32 x i8> undef, i8* undef, <32 x i1> undef)		call void @llvm.masked.compressstore.v32i8(<32 x i8> undef, i8* undef, <32 x i1> undef)
call void @llvm.masked.compressstore.v16i8(<16 x i8> undef, i8* undef, <16 x i1> undef)		call void @llvm.masked.compressstore.v16i8(<16 x i8> undef, i8* undef, <16 x i1> undef)
call void @llvm.masked.compressstore.v8i8(<8 x i8> undef, i8* undef, <8 x i1> undef)		call void @llvm.masked.compressstore.v8i8(<8 x i8> undef, i8* undef, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define <2 x double> @test1(<2 x i64> %trigger, <2 x double>* %addr, <2 x double> %dst) {		define <2 x double> @test1(<2 x i64> %trigger, <2 x double>* %addr, <2 x double> %dst) #0 {
; SSE2-LABEL: 'test1'		; SSE2-LABEL: 'test1'
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res
;		;
; SSE42-LABEL: 'test1'		; SSE42-LABEL: 'test1'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res
;		;
%mask = icmp eq <2 x i64> %trigger, zeroinitializer		%mask = icmp eq <2 x i64> %trigger, zeroinitializer
%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1>%mask, <2 x double>%dst)		%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1>%mask, <2 x double>%dst)
ret <2 x double> %res		ret <2 x double> %res
}		}

define <4 x i32> @test2(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %dst) {		define <4 x i32> @test2(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %dst) #0 {
; SSE2-LABEL: 'test2'		; SSE2-LABEL: 'test2'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
; SSE42-LABEL: 'test2'		; SSE42-LABEL: 'test2'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
%mask = icmp eq <4 x i32> %trigger, zeroinitializer		%mask = icmp eq <4 x i32> %trigger, zeroinitializer
%res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1>%mask, <4 x i32>%dst)		%res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1>%mask, <4 x i32>%dst)
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define void @test3(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) {		define void @test3(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) #0 {
; SSE2-LABEL: 'test3'		; SSE2-LABEL: 'test3'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test3'		; SSE42-LABEL: 'test3'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <4 x i32> %trigger, zeroinitializer		%mask = icmp eq <4 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>%val, <4 x i32>* %addr, i32 4, <4 x i1>%mask)		call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>%val, <4 x i32>* %addr, i32 4, <4 x i1>%mask)
ret void		ret void
}		}

define <8 x float> @test4(<8 x i32> %trigger, <8 x float>* %addr, <8 x float> %dst) {		define <8 x float> @test4(<8 x i32> %trigger, <8 x float>* %addr, <8 x float> %dst) #0 {
; SSE2-LABEL: 'test4'		; SSE2-LABEL: 'test4'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res
;		;
; SSE42-LABEL: 'test4'		; SSE42-LABEL: 'test4'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
Show All 19 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res
;		;
%mask = icmp eq <8 x i32> %trigger, zeroinitializer		%mask = icmp eq <8 x i32> %trigger, zeroinitializer
%res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1>%mask, <8 x float>%dst)		%res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1>%mask, <8 x float>%dst)
ret <8 x float> %res		ret <8 x float> %res
}		}

define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {		define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) #0 {
; SSE2-LABEL: 'test5'		; SSE2-LABEL: 'test5'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test5'		; SSE42-LABEL: 'test5'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v2f32.p0v2f32(<2 x float>%val, <2 x float>* %addr, i32 4, <2 x i1>%mask)		call void @llvm.masked.store.v2f32.p0v2f32(<2 x float>%val, <2 x float>* %addr, i32 4, <2 x i1>%mask)
ret void		ret void
}		}

define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {		define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) #0 {
; SSE2-LABEL: 'test6'		; SSE2-LABEL: 'test6'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test6'		; SSE42-LABEL: 'test6'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)		call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)
ret void		ret void
}		}

define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) {		define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) #0 {
; SSE2-LABEL: 'test7'		; SSE2-LABEL: 'test7'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
;		;
; SSE42-LABEL: 'test7'		; SSE42-LABEL: 'test7'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
%res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1>%mask, <2 x float>%dst)		%res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1>%mask, <2 x float>%dst)
ret <2 x float> %res		ret <2 x float> %res
}		}

define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) {		define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) #0 {
; SSE2-LABEL: 'test8'		; SSE2-LABEL: 'test8'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
;		;
; SSE42-LABEL: 'test8'		; SSE42-LABEL: 'test8'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; SKX-LABEL: 'test_gather_4i32_const_mask'		; SKX-LABEL: 'test_gather_4i32_const_mask'
; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)		; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)
; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
%res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)		%res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <16 x float> @test_gather_16f32_const_mask(float* %base, <16 x i32> %ind) {		define <16 x float> @test_gather_16f32_const_mask(float* %base, <16 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_16f32_const_mask'		; SSE2-LABEL: 'test_gather_16f32_const_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_const_mask'		; SSE42-LABEL: 'test_gather_16f32_const_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_var_mask(float* %base, <16 x i32> %ind, <16 x i1>%mask) {		define <16 x float> @test_gather_16f32_var_mask(float* %base, <16 x i32> %ind, <16 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_16f32_var_mask'		; SSE2-LABEL: 'test_gather_16f32_var_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_var_mask'		; SSE42-LABEL: 'test_gather_16f32_var_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_ra_var_mask(<16 x float*> %ptrs, <16 x i32> %ind, <16 x i1>%mask) {		define <16 x float> @test_gather_16f32_ra_var_mask(<16 x float*> %ptrs, <16 x i32> %ind, <16 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_16f32_ra_var_mask'		; SSE2-LABEL: 'test_gather_16f32_ra_var_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_ra_var_mask'		; SSE42-LABEL: 'test_gather_16f32_ra_var_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind		%gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_const_mask2(float* %base, <16 x i32> %ind) {		define <16 x float> @test_gather_16f32_const_mask2(float* %base, <16 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_16f32_const_mask2'		; SSE2-LABEL: 'test_gather_16f32_const_mask2'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x float> poison, float %base, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x float> poison, float %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> poison, <16 x i32> zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> poison, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;

%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind		%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define void @test_scatter_16i32(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) {		define void @test_scatter_16i32(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) #0 {
; SSE2-LABEL: 'test_scatter_16i32'		; SSE2-LABEL: 'test_scatter_16i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x i32> poison, i32 %base, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x i32> poison, i32 %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> poison, <16 x i32> zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> poison, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask = bitcast i16 %mask to <16 x i1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask = bitcast i16 %mask to <16 x i1>
; SSE2-NEXT: Cost Model: Found an estimated cost of 93 for instruction: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> %val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 93 for instruction: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> %val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> poison, <16 x i32> zeroinitializer		%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> poison, <16 x i32> zeroinitializer

%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind		%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
%imask = bitcast i16 %mask to <16 x i1>		%imask = bitcast i16 %mask to <16 x i1>
call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)		call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
ret void		ret void
}		}

define void @test_scatter_8i32(<8 x i32>%a1, <8 x i32*> %ptr, <8 x i1>%mask) {		define void @test_scatter_8i32(<8 x i32>%a1, <8 x i32*> %ptr, <8 x i1>%mask) #0 {
; SSE2-LABEL: 'test_scatter_8i32'		; SSE2-LABEL: 'test_scatter_8i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 47 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 47 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test_scatter_8i32'		; SSE42-LABEL: 'test_scatter_8i32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 33 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 33 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX-LABEL: 'test_scatter_8i32'		; AVX-LABEL: 'test_scatter_8i32'
; AVX-NEXT: Cost Model: Found an estimated cost of 36 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; AVX-NEXT: Cost Model: Found an estimated cost of 36 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX512-LABEL: 'test_scatter_8i32'		; AVX512-LABEL: 'test_scatter_8i32'
; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
ret void		ret void
}		}

define void @test_scatter_4i32(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) {		define void @test_scatter_4i32(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) #0 {
; SSE2-LABEL: 'test_scatter_4i32'		; SSE2-LABEL: 'test_scatter_4i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test_scatter_4i32'		; SSE42-LABEL: 'test_scatter_4i32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX-LABEL: 'test_scatter_4i32'		; AVX-LABEL: 'test_scatter_4i32'
; AVX-NEXT: Cost Model: Found an estimated cost of 18 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; AVX-NEXT: Cost Model: Found an estimated cost of 18 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; KNL-LABEL: 'test_scatter_4i32'		; KNL-LABEL: 'test_scatter_4i32'
; KNL-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; KNL-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; KNL-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; KNL-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SKX-LABEL: 'test_scatter_4i32'		; SKX-LABEL: 'test_scatter_4i32'
; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
ret void		ret void
}		}

define <4 x float> @test_gather_4f32(float* %ptr, <4 x i32> %ind, <4 x i1>%mask) {		define <4 x float> @test_gather_4f32(float* %ptr, <4 x i32> %ind, <4 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_4f32'		; SSE2-LABEL: 'test_gather_4f32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res
;		;
; SSE42-LABEL: 'test_gather_4f32'		; SSE42-LABEL: 'test_gather_4f32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
Show All 33 Lines
;		;
%sext_ind = sext <4 x i32> %ind to <4 x i64>		%sext_ind = sext <4 x i32> %ind to <4 x i64>
%gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		%gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind

%res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)		%res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)
ret <4 x float>%res		ret <4 x float>%res
}		}

define <4 x float> @test_gather_4f32_const_mask(float* %ptr, <4 x i32> %ind) {		define <4 x float> @test_gather_4f32_const_mask(float* %ptr, <4 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_4f32_const_mask'		; SSE2-LABEL: 'test_gather_4f32_const_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res
;		;
; SSE42-LABEL: 'test_gather_4f32_const_mask'		; SSE42-LABEL: 'test_gather_4f32_const_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
declare void @llvm.masked.compressstore.v16i16(<16 x i16>, i16*, <16 x i1>)		declare void @llvm.masked.compressstore.v16i16(<16 x i16>, i16*, <16 x i1>)
declare void @llvm.masked.compressstore.v8i16(<8 x i16>, i16*, <8 x i1>)		declare void @llvm.masked.compressstore.v8i16(<8 x i16>, i16*, <8 x i1>)
declare void @llvm.masked.compressstore.v4i16(<4 x i16>, i16*, <4 x i1>)		declare void @llvm.masked.compressstore.v4i16(<4 x i16>, i16*, <4 x i1>)

declare void @llvm.masked.compressstore.v64i8(<64 x i8>, i8*, <64 x i1>)		declare void @llvm.masked.compressstore.v64i8(<64 x i8>, i8*, <64 x i1>)
declare void @llvm.masked.compressstore.v32i8(<32 x i8>, i8*, <32 x i1>)		declare void @llvm.masked.compressstore.v32i8(<32 x i8>, i8*, <32 x i1>)
declare void @llvm.masked.compressstore.v16i8(<16 x i8>, i8*, <16 x i1>)		declare void @llvm.masked.compressstore.v16i8(<16 x i8>, i8*, <16 x i1>)
declare void @llvm.masked.compressstore.v8i8(<8 x i8>, i8*, <8 x i1>)		declare void @llvm.masked.compressstore.v8i8(<8 x i8>, i8*, <8 x i1>)

		attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE2		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE2
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse4.2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE42		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+sse4.2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=SSE42
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX1		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX1
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX2		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mattr=+avx2 -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,AVX2
;		;
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skylake -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,SKL		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skylake -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX,SKL
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=knl -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,KNL		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=knl -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,KNL
; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,SKX		; RUN: opt < %s -S -mtriple=x86_64-apple-darwin -mcpu=skx -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefixes=AVX512,SKX

define i32 @masked_load() {		define i32 @masked_load() #0 {
; SSE2-LABEL: 'masked_load'		; SSE2-LABEL: 'masked_load'
; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V7F64 = call <7 x double> @llvm.masked.load.v7f64.p0v7f64(<7 x double>* undef, i32 1, <7 x i1> undef, <7 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V7F64 = call <7 x double> @llvm.masked.load.v7f64.p0v7f64(<7 x double>* undef, i32 1, <7 x i1> undef, <7 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V6F64 = call <6 x double> @llvm.masked.load.v6f64.p0v6f64(<6 x double>* undef, i32 1, <6 x i1> undef, <6 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V6F64 = call <6 x double> @llvm.masked.load.v6f64.p0v6f64(<6 x double>* undef, i32 1, <6 x i1> undef, <6 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V5F64 = call <5 x double> @llvm.masked.load.v5f64.p0v5f64(<5 x double>* undef, i32 1, <5 x i1> undef, <5 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V5F64 = call <5 x double> @llvm.masked.load.v5f64.p0v5f64(<5 x double>* undef, i32 1, <5 x i1> undef, <5 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V4F64 = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* undef, i32 1, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V4F64 = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double>* undef, i32 1, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V3F64 = call <3 x double> @llvm.masked.load.v3f64.p0v3f64(<3 x double>* undef, i32 1, <3 x i1> undef, <3 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V3F64 = call <3 x double> @llvm.masked.load.v3f64.p0v3f64(<3 x double>* undef, i32 1, <3 x i1> undef, <3 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* undef, i32 1, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* undef, i32 1, <2 x i1> undef, <2 x double> undef)
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_store() {		define i32 @masked_store() #0 {
; SSE2-LABEL: 'masked_store'		; SSE2-LABEL: 'masked_store'
; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 35 for instruction: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.store.v7f64.p0v7f64(<7 x double> undef, <7 x double>* undef, i32 1, <7 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.store.v7f64.p0v7f64(<7 x double> undef, <7 x double>* undef, i32 1, <7 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: call void @llvm.masked.store.v6f64.p0v6f64(<6 x double> undef, <6 x double>* undef, i32 1, <6 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: call void @llvm.masked.store.v6f64.p0v6f64(<6 x double> undef, <6 x double>* undef, i32 1, <6 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.store.v5f64.p0v5f64(<5 x double> undef, <5 x double>* undef, i32 1, <5 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.store.v5f64.p0v5f64(<5 x double> undef, <5 x double>* undef, i32 1, <5 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> undef, <4 x double>* undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> undef, <4 x double>* undef, i32 1, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v3f64.p0v3f64(<3 x double> undef, <3 x double>* undef, i32 1, <3 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v3f64.p0v3f64(<3 x double> undef, <3 x double>* undef, i32 1, <3 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f64.p0v2f64(<2 x double> undef, <2 x double>* undef, i32 1, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f64.p0v2f64(<2 x double> undef, <2 x double>* undef, i32 1, <2 x i1> undef)
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)		call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)
call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)		call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)
call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)		call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)
call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)		call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_gather() {		define i32 @masked_gather() #0 {
; SSE2-LABEL: 'masked_gather'		; SSE2-LABEL: 'masked_gather'
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> undef, i32 1, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> undef, i32 1, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> undef, i32 1, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64(<4 x double*> undef, i32 1, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> undef, i32 1, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> undef, i32 1, <2 x i1> undef, <2 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0f64(<1 x double*> undef, i32 1, <1 x i1> undef, <1 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0f64(<1 x double*> undef, i32 1, <1 x i1> undef, <1 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 1, <16 x i1> undef, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 1, <16 x i1> undef, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float*> undef, i32 1, <8 x i1> undef, <8 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float*> undef, i32 1, <8 x i1> undef, <8 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> undef, i32 1, <4 x i1> undef, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> undef, i32 1, <4 x i1> undef, <4 x float> undef)
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0i8(<64 x i8*> undef, i32 1, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0i8(<64 x i8*> undef, i32 1, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0i8(<32 x i8*> undef, i32 1, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0i8(<32 x i8*> undef, i32 1, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0i8(<16 x i8*> undef, i32 1, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0i8(<16 x i8*> undef, i32 1, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0i8(<8 x i8*> undef, i32 1, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0i8(<8 x i8*> undef, i32 1, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_scatter() {		define i32 @masked_scatter() #0 {
; SSE2-LABEL: 'masked_scatter'		; SSE2-LABEL: 'masked_scatter'
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> undef, <8 x double*> undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> undef, <8 x double*> undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: call void @llvm.masked.scatter.v4f64.v4p0f64(<4 x double> undef, <4 x double*> undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: call void @llvm.masked.scatter.v4f64.v4p0f64(<4 x double> undef, <4 x double*> undef, i32 1, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> undef, <2 x double*> undef, i32 1, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.scatter.v2f64.v2p0f64(<2 x double> undef, <2 x double*> undef, i32 1, <2 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1f64.v1p0f64(<1 x double> undef, <1 x double*> undef, i32 1, <1 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1f64.v1p0f64(<1 x double> undef, <1 x double*> undef, i32 1, <1 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> undef, <16 x float*> undef, i32 1, <16 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> undef, <16 x float*> undef, i32 1, <16 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.scatter.v8f32.v8p0f32(<8 x float> undef, <8 x float*> undef, i32 1, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 30 for instruction: call void @llvm.masked.scatter.v8f32.v8p0f32(<8 x float> undef, <8 x float*> undef, i32 1, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.scatter.v4f32.v4p0f32(<4 x float> undef, <4 x float*> undef, i32 1, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.scatter.v4f32.v4p0f32(<4 x float> undef, <4 x float*> undef, i32 1, <4 x i1> undef)
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.scatter.v64i8.v64p0i8(<64 x i8> undef, <64 x i8*> undef, i32 1, <64 x i1> undef)		call void @llvm.masked.scatter.v64i8.v64p0i8(<64 x i8> undef, <64 x i8*> undef, i32 1, <64 x i1> undef)
call void @llvm.masked.scatter.v32i8.v32p0i8(<32 x i8> undef, <32 x i8*> undef, i32 1, <32 x i1> undef)		call void @llvm.masked.scatter.v32i8.v32p0i8(<32 x i8> undef, <32 x i8*> undef, i32 1, <32 x i1> undef)
call void @llvm.masked.scatter.v16i8.v16p0i8(<16 x i8> undef, <16 x i8*> undef, i32 1, <16 x i1> undef)		call void @llvm.masked.scatter.v16i8.v16p0i8(<16 x i8> undef, <16 x i8*> undef, i32 1, <16 x i1> undef)
call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> undef, <8 x i8*> undef, i32 1, <8 x i1> undef)		call void @llvm.masked.scatter.v8i8.v8p0i8(<8 x i8> undef, <8 x i8*> undef, i32 1, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_expandload() {		define i32 @masked_expandload() #0 {
; SSE2-LABEL: 'masked_expandload'		; SSE2-LABEL: 'masked_expandload'
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F64 = call <8 x double> @llvm.masked.expandload.v8f64(double* undef, <8 x i1> undef, <8 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F64 = call <8 x double> @llvm.masked.expandload.v8f64(double* undef, <8 x i1> undef, <8 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F64 = call <4 x double> @llvm.masked.expandload.v4f64(double* undef, <4 x i1> undef, <4 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F64 = call <4 x double> @llvm.masked.expandload.v4f64(double* undef, <4 x i1> undef, <4 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.masked.expandload.v2f64(double* undef, <2 x i1> undef, <2 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.masked.expandload.v2f64(double* undef, <2 x i1> undef, <2 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.expandload.v1f64(double* undef, <1 x i1> undef, <1 x double> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.expandload.v1f64(double* undef, <1 x i1> undef, <1 x double> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.masked.expandload.v16f32(float* undef, <16 x i1> undef, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.masked.expandload.v16f32(float* undef, <16 x i1> undef, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.masked.expandload.v8f32(float* undef, <8 x i1> undef, <8 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.masked.expandload.v8f32(float* undef, <8 x i1> undef, <8 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.masked.expandload.v4f32(float* undef, <4 x i1> undef, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.masked.expandload.v4f32(float* undef, <4 x i1> undef, <4 x float> undef)
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	;
%V64I8 = call <64 x i8> @llvm.masked.expandload.v64i8(i8* undef, <64 x i1> undef, <64 x i8> undef)		%V64I8 = call <64 x i8> @llvm.masked.expandload.v64i8(i8* undef, <64 x i1> undef, <64 x i8> undef)
%V32I8 = call <32 x i8> @llvm.masked.expandload.v32i8(i8* undef, <32 x i1> undef, <32 x i8> undef)		%V32I8 = call <32 x i8> @llvm.masked.expandload.v32i8(i8* undef, <32 x i1> undef, <32 x i8> undef)
%V16I8 = call <16 x i8> @llvm.masked.expandload.v16i8(i8* undef, <16 x i1> undef, <16 x i8> undef)		%V16I8 = call <16 x i8> @llvm.masked.expandload.v16i8(i8* undef, <16 x i1> undef, <16 x i8> undef)
%V8I8 = call <8 x i8> @llvm.masked.expandload.v8i8(i8* undef, <8 x i1> undef, <8 x i8> undef)		%V8I8 = call <8 x i8> @llvm.masked.expandload.v8i8(i8* undef, <8 x i1> undef, <8 x i8> undef)

ret i32 0		ret i32 0
}		}

define i32 @masked_compressstore() {		define i32 @masked_compressstore() #0 {
; SSE2-LABEL: 'masked_compressstore'		; SSE2-LABEL: 'masked_compressstore'
; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @llvm.masked.compressstore.v8f64(<8 x double> undef, double* undef, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @llvm.masked.compressstore.v8f64(<8 x double> undef, double* undef, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.compressstore.v4f64(<4 x double> undef, double* undef, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.compressstore.v4f64(<4 x double> undef, double* undef, <4 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.compressstore.v2f64(<2 x double> undef, double* undef, <2 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.compressstore.v2f64(<2 x double> undef, double* undef, <2 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.compressstore.v1f64(<1 x double> undef, double* undef, <1 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.compressstore.v1f64(<1 x double> undef, double* undef, <1 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.compressstore.v16f32(<16 x float> undef, float* undef, <16 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.compressstore.v16f32(<16 x float> undef, float* undef, <16 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.compressstore.v8f32(<8 x float> undef, float* undef, <8 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: call void @llvm.masked.compressstore.v8f32(<8 x float> undef, float* undef, <8 x i1> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.compressstore.v4f32(<4 x float> undef, float* undef, <4 x i1> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.compressstore.v4f32(<4 x float> undef, float* undef, <4 x i1> undef)
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	;
call void @llvm.masked.compressstore.v64i8(<64 x i8> undef, i8* undef, <64 x i1> undef)		call void @llvm.masked.compressstore.v64i8(<64 x i8> undef, i8* undef, <64 x i1> undef)
call void @llvm.masked.compressstore.v32i8(<32 x i8> undef, i8* undef, <32 x i1> undef)		call void @llvm.masked.compressstore.v32i8(<32 x i8> undef, i8* undef, <32 x i1> undef)
call void @llvm.masked.compressstore.v16i8(<16 x i8> undef, i8* undef, <16 x i1> undef)		call void @llvm.masked.compressstore.v16i8(<16 x i8> undef, i8* undef, <16 x i1> undef)
call void @llvm.masked.compressstore.v8i8(<8 x i8> undef, i8* undef, <8 x i1> undef)		call void @llvm.masked.compressstore.v8i8(<8 x i8> undef, i8* undef, <8 x i1> undef)

ret i32 0		ret i32 0
}		}

define <2 x double> @test1(<2 x i64> %trigger, <2 x double>* %addr, <2 x double> %dst) {		define <2 x double> @test1(<2 x i64> %trigger, <2 x double>* %addr, <2 x double> %dst) #0 {
; SSE2-LABEL: 'test1'		; SSE2-LABEL: 'test1'
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res
;		;
; SSE42-LABEL: 'test1'		; SSE42-LABEL: 'test1'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i64> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1> %mask, <2 x double> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x double> %res
;		;
%mask = icmp eq <2 x i64> %trigger, zeroinitializer		%mask = icmp eq <2 x i64> %trigger, zeroinitializer
%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1>%mask, <2 x double>%dst)		%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %addr, i32 4, <2 x i1>%mask, <2 x double>%dst)
ret <2 x double> %res		ret <2 x double> %res
}		}

define <4 x i32> @test2(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %dst) {		define <4 x i32> @test2(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %dst) #0 {
; SSE2-LABEL: 'test2'		; SSE2-LABEL: 'test2'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
; SSE42-LABEL: 'test2'		; SSE42-LABEL: 'test2'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1> %mask, <4 x i32> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
%mask = icmp eq <4 x i32> %trigger, zeroinitializer		%mask = icmp eq <4 x i32> %trigger, zeroinitializer
%res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1>%mask, <4 x i32>%dst)		%res = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %addr, i32 4, <4 x i1>%mask, <4 x i32>%dst)
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define void @test3(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) {		define void @test3(<4 x i32> %trigger, <4 x i32>* %addr, <4 x i32> %val) #0 {
; SSE2-LABEL: 'test3'		; SSE2-LABEL: 'test3'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test3'		; SSE42-LABEL: 'test3'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <4 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %val, <4 x i32>* %addr, i32 4, <4 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <4 x i32> %trigger, zeroinitializer		%mask = icmp eq <4 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>%val, <4 x i32>* %addr, i32 4, <4 x i1>%mask)		call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>%val, <4 x i32>* %addr, i32 4, <4 x i1>%mask)
ret void		ret void
}		}

define <8 x float> @test4(<8 x i32> %trigger, <8 x float>* %addr, <8 x float> %dst) {		define <8 x float> @test4(<8 x i32> %trigger, <8 x float>* %addr, <8 x float> %dst) #0 {
; SSE2-LABEL: 'test4'		; SSE2-LABEL: 'test4'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res
;		;
; SSE42-LABEL: 'test4'		; SSE42-LABEL: 'test4'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %mask = icmp eq <8 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
Show All 19 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1> %mask, <8 x float> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x float> %res
;		;
%mask = icmp eq <8 x i32> %trigger, zeroinitializer		%mask = icmp eq <8 x i32> %trigger, zeroinitializer
%res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1>%mask, <8 x float>%dst)		%res = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %addr, i32 4, <8 x i1>%mask, <8 x float>%dst)
ret <8 x float> %res		ret <8 x float> %res
}		}

define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {		define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) #0 {
; SSE2-LABEL: 'test5'		; SSE2-LABEL: 'test5'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test5'		; SSE42-LABEL: 'test5'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v2f32.p0v2f32(<2 x float>%val, <2 x float>* %addr, i32 4, <2 x i1>%mask)		call void @llvm.masked.store.v2f32.p0v2f32(<2 x float>%val, <2 x float>* %addr, i32 4, <2 x i1>%mask)
ret void		ret void
}		}

define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {		define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) #0 {
; SSE2-LABEL: 'test6'		; SSE2-LABEL: 'test6'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test6'		; SSE42-LABEL: 'test6'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)		call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32>%val, <2 x i32>* %addr, i32 4, <2 x i1>%mask)
ret void		ret void
}		}

define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) {		define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) #0 {
; SSE2-LABEL: 'test7'		; SSE2-LABEL: 'test7'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
;		;
; SSE42-LABEL: 'test7'		; SSE42-LABEL: 'test7'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
Show All 9 Lines
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
;		;
%mask = icmp eq <2 x i32> %trigger, zeroinitializer		%mask = icmp eq <2 x i32> %trigger, zeroinitializer
%res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1>%mask, <2 x float>%dst)		%res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1>%mask, <2 x float>%dst)
ret <2 x float> %res		ret <2 x float> %res
}		}

define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) {		define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) #0 {
; SSE2-LABEL: 'test8'		; SSE2-LABEL: 'test8'
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
;		;
; SSE42-LABEL: 'test8'		; SSE42-LABEL: 'test8'
; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; SKX-LABEL: 'test_gather_4i32_const_mask'		; SKX-LABEL: 'test_gather_4i32_const_mask'
; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)		; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)
; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res		; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
;		;
%res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)		%res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> %src0)
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <16 x float> @test_gather_16f32_const_mask(float* %base, <16 x i32> %ind) {		define <16 x float> @test_gather_16f32_const_mask(float* %base, <16 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_16f32_const_mask'		; SSE2-LABEL: 'test_gather_16f32_const_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_const_mask'		; SSE42-LABEL: 'test_gather_16f32_const_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_var_mask(float* %base, <16 x i32> %ind, <16 x i1>%mask) {		define <16 x float> @test_gather_16f32_var_mask(float* %base, <16 x i32> %ind, <16 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_16f32_var_mask'		; SSE2-LABEL: 'test_gather_16f32_var_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_var_mask'		; SSE42-LABEL: 'test_gather_16f32_var_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind		%gep.v = getelementptr float, float* %base, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_ra_var_mask(<16 x float*> %ptrs, <16 x i32> %ind, <16 x i1>%mask) {		define <16 x float> @test_gather_16f32_ra_var_mask(<16 x float*> %ptrs, <16 x i32> %ind, <16 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_16f32_ra_var_mask'		; SSE2-LABEL: 'test_gather_16f32_ra_var_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 77 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
; SSE42-LABEL: 'test_gather_16f32_ra_var_mask'		; SSE42-LABEL: 'test_gather_16f32_ra_var_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
Show All 27 Lines
;		;
%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind		%gep.v = getelementptr float, <16 x float*> %ptrs, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.v, i32 4, <16 x i1> %mask, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define <16 x float> @test_gather_16f32_const_mask2(float* %base, <16 x i32> %ind) {		define <16 x float> @test_gather_16f32_const_mask2(float* %base, <16 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_16f32_const_mask2'		; SSE2-LABEL: 'test_gather_16f32_const_mask2'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sext_ind = sext <16 x i32> %ind to <16 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x float> %res
;		;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;

%sext_ind = sext <16 x i32> %ind to <16 x i64>		%sext_ind = sext <16 x i32> %ind to <16 x i64>
%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind		%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind

%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)		%res = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
ret <16 x float>%res		ret <16 x float>%res
}		}

define void @test_scatter_16i32(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) {		define void @test_scatter_16i32(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) #0 {
; SSE2-LABEL: 'test_scatter_16i32'		; SSE2-LABEL: 'test_scatter_16i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask = bitcast i16 %mask to <16 x i1>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %imask = bitcast i16 %mask to <16 x i1>
; SSE2-NEXT: Cost Model: Found an estimated cost of 93 for instruction: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> %val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 93 for instruction: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> %val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer		%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer

%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind		%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
%imask = bitcast i16 %mask to <16 x i1>		%imask = bitcast i16 %mask to <16 x i1>
call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)		call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
ret void		ret void
}		}

define void @test_scatter_8i32(<8 x i32>%a1, <8 x i32*> %ptr, <8 x i1>%mask) {		define void @test_scatter_8i32(<8 x i32>%a1, <8 x i32*> %ptr, <8 x i1>%mask) #0 {
; SSE2-LABEL: 'test_scatter_8i32'		; SSE2-LABEL: 'test_scatter_8i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 47 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 47 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test_scatter_8i32'		; SSE42-LABEL: 'test_scatter_8i32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 33 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 33 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX-LABEL: 'test_scatter_8i32'		; AVX-LABEL: 'test_scatter_8i32'
; AVX-NEXT: Cost Model: Found an estimated cost of 36 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; AVX-NEXT: Cost Model: Found an estimated cost of 36 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX512-LABEL: 'test_scatter_8i32'		; AVX512-LABEL: 'test_scatter_8i32'
; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)		call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> %mask)
ret void		ret void
}		}

define void @test_scatter_4i32(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) {		define void @test_scatter_4i32(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) #0 {
; SSE2-LABEL: 'test_scatter_4i32'		; SSE2-LABEL: 'test_scatter_4i32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SSE42-LABEL: 'test_scatter_4i32'		; SSE42-LABEL: 'test_scatter_4i32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SSE42-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; AVX-LABEL: 'test_scatter_4i32'		; AVX-LABEL: 'test_scatter_4i32'
; AVX-NEXT: Cost Model: Found an estimated cost of 18 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; AVX-NEXT: Cost Model: Found an estimated cost of 18 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; KNL-LABEL: 'test_scatter_4i32'		; KNL-LABEL: 'test_scatter_4i32'
; KNL-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; KNL-NEXT: Cost Model: Found an estimated cost of 21 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; KNL-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; KNL-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SKX-LABEL: 'test_scatter_4i32'		; SKX-LABEL: 'test_scatter_4i32'
; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		; SKX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; SKX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)		call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
ret void		ret void
}		}

define <4 x float> @test_gather_4f32(float* %ptr, <4 x i32> %ind, <4 x i1>%mask) {		define <4 x float> @test_gather_4f32(float* %ptr, <4 x i32> %ind, <4 x i1>%mask) #0 {
; SSE2-LABEL: 'test_gather_4f32'		; SSE2-LABEL: 'test_gather_4f32'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res
;		;
; SSE42-LABEL: 'test_gather_4f32'		; SSE42-LABEL: 'test_gather_4f32'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
Show All 33 Lines
;		;
%sext_ind = sext <4 x i32> %ind to <4 x i64>		%sext_ind = sext <4 x i32> %ind to <4 x i64>
%gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		%gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind

%res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)		%res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> %mask, <4 x float> undef)
ret <4 x float>%res		ret <4 x float>%res
}		}

define <4 x float> @test_gather_4f32_const_mask(float* %ptr, <4 x i32> %ind) {		define <4 x float> @test_gather_4f32_const_mask(float* %ptr, <4 x i32> %ind) #0 {
; SSE2-LABEL: 'test_gather_4f32_const_mask'		; SSE2-LABEL: 'test_gather_4f32_const_mask'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %gep.v = getelementptr float, float* %ptr, <4 x i64> %sext_ind
; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)		; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %res = call <4 x float> @llvm.masked.gather.v4f32.v4p0f32(<4 x float*> %gep.v, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x float> undef)
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x float> %res
;		;
; SSE42-LABEL: 'test_gather_4f32_const_mask'		; SSE42-LABEL: 'test_gather_4f32_const_mask'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext_ind = sext <4 x i32> %ind to <4 x i64>
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
declare void @llvm.masked.compressstore.v16i16(<16 x i16>, i16*, <16 x i1>)		declare void @llvm.masked.compressstore.v16i16(<16 x i16>, i16*, <16 x i1>)
declare void @llvm.masked.compressstore.v8i16(<8 x i16>, i16*, <8 x i1>)		declare void @llvm.masked.compressstore.v8i16(<8 x i16>, i16*, <8 x i1>)
declare void @llvm.masked.compressstore.v4i16(<4 x i16>, i16*, <4 x i1>)		declare void @llvm.masked.compressstore.v4i16(<4 x i16>, i16*, <4 x i1>)

declare void @llvm.masked.compressstore.v64i8(<64 x i8>, i8*, <64 x i1>)		declare void @llvm.masked.compressstore.v64i8(<64 x i8>, i8*, <64 x i1>)
declare void @llvm.masked.compressstore.v32i8(<32 x i8>, i8*, <32 x i1>)		declare void @llvm.masked.compressstore.v32i8(<32 x i8>, i8*, <32 x i1>)
declare void @llvm.masked.compressstore.v16i8(<16 x i8>, i8*, <16 x i1>)		declare void @llvm.masked.compressstore.v16i8(<16 x i8>, i8*, <16 x i1>)
declare void @llvm.masked.compressstore.v8i8(<8 x i8>, i8*, <8 x i1>)		declare void @llvm.masked.compressstore.v8i8(<8 x i8>, i8*, <8 x i1>)

		attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/Analysis/CostModel/X86/powi.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64 \| FileCheck %s --check-prefixes=SSE		; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64 \| FileCheck %s --check-prefixes=SSE
; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v2 \| FileCheck %s --check-prefixes=AVX1		; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v2 \| FileCheck %s --check-prefixes=AVX1
; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v3 \| FileCheck %s --check-prefixes=AVX2		; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v3 \| FileCheck %s --check-prefixes=AVX2
; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v4 \| FileCheck %s --check-prefixes=AVX512		; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=x86_64-linux-gnu -mcpu=x86-64-v4 \| FileCheck %s --check-prefixes=AVX512

define i32 @powi_var(i32 %arg) {		define i32 @powi_var(i32 %arg) #0 {
; SSE-LABEL: 'powi_var'		; SSE-LABEL: 'powi_var'
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 %arg)
; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 %arg)		; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 %arg)
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 %arg)		%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 %arg)
%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 %arg)		%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 %arg)
%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 %arg)		%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 %arg)
%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 %arg)		%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 %arg)

ret i32 poison		ret i32 poison
}		}

define i32 @powi_3() {		define i32 @powi_3() #0 {
; SSE-LABEL: 'powi_3'		; SSE-LABEL: 'powi_3'
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 3)
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 3)		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 3)
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 3)		%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 3)
%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 3)		%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 3)
%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 3)		%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 3)
%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 3)		%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 3)

ret i32 poison		ret i32 poison
}		}

define i32 @powi_n3() {		define i32 @powi_n3() #0 {
; SSE-LABEL: 'powi_n3'		; SSE-LABEL: 'powi_n3'
; SSE-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 172 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 172 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 -3)
; SSE-NEXT: Cost Model: Found an estimated cost of 73 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 -3)		; SSE-NEXT: Cost Model: Found an estimated cost of 73 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 -3)
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 -3)		%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 -3)
%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 -3)		%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 -3)
%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 -3)		%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 -3)
%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 -3)		%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 -3)

ret i32 poison		ret i32 poison
}		}

define i32 @powi_6() {		define i32 @powi_6() #0 {
; SSE-LABEL: 'powi_6'		; SSE-LABEL: 'powi_6'
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 6)
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 6)		%V2F64 = call <2 x double> @llvm.powi.v2f64(<2 x double> poison, i32 6)
%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 6)		%V4F64 = call <4 x double> @llvm.powi.v4f64(<4 x double> poison, i32 6)
%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 6)		%V8F64 = call <8 x double> @llvm.powi.v8f64(<8 x double> poison, i32 6)
%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 6)		%V16F64 = call <16 x double> @llvm.powi.v16f64(<16 x double> poison, i32 6)

ret i32 poison		ret i32 poison
}		}

define i32 @powi_16() {		define i32 @powi_16() #0 {
; SSE-LABEL: 'powi_16'		; SSE-LABEL: 'powi_16'
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %F32 = call float @llvm.powi.f32.i32(float poison, i32 16)
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F32 = call <2 x float> @llvm.powi.v2f32.i32(<2 x float> poison, i32 16)
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F32 = call <4 x float> @llvm.powi.v4f32.i32(<4 x float> poison, i32 16)
; SSE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F32 = call <8 x float> @llvm.powi.v8f32.i32(<8 x float> poison, i32 16)
; SSE-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16F32 = call <16 x float> @llvm.powi.v16f32.i32(<16 x float> poison, i32 16)
; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 6)		; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %F64 = call double @llvm.powi.f64.i32(double poison, i32 6)
; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 16)		; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> poison, i32 16)
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
declare <8 x float> @llvm.powi.v8f32(<8 x float>, i32)		declare <8 x float> @llvm.powi.v8f32(<8 x float>, i32)
declare <16 x float> @llvm.powi.v16f32(<16 x float>, i32)		declare <16 x float> @llvm.powi.v16f32(<16 x float>, i32)

declare double @llvm.powi.f64(double, i32)		declare double @llvm.powi.f64(double, i32)
declare <2 x double> @llvm.powi.v2f64(<2 x double>, i32)		declare <2 x double> @llvm.powi.v2f64(<2 x double>, i32)
declare <4 x double> @llvm.powi.v4f64(<4 x double>, i32)		declare <4 x double> @llvm.powi.v4f64(<4 x double>, i32)
declare <8 x double> @llvm.powi.v8f64(<8 x double>, i32)		declare <8 x double> @llvm.powi.v8f64(<8 x double>, i32)
declare <16 x double> @llvm.powi.v16f64(<16 x double>, i32)		declare <16 x double> @llvm.powi.v16f64(<16 x double>, i32)

		attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/CodeGen/X86/avx512-calling-conv.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=knl \| FileCheck %s --check-prefix=ALL_X64 --check-prefix=KNL		; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=knl \| FileCheck %s --check-prefix=ALL_X64 --check-prefix=KNL
; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=skx \| FileCheck %s --check-prefix=ALL_X64 --check-prefix=SKX		; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=skx \| FileCheck %s --check-prefix=ALL_X64 --check-prefix=SKX
; RUN: llc < %s -mtriple=i686-apple-darwin9 -mcpu=knl \| FileCheck %s --check-prefix=KNL_X32		; RUN: llc < %s -mtriple=i686-apple-darwin9 -mcpu=knl \| FileCheck %s --check-prefix=KNL_X32
; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=skx -fast-isel \| FileCheck %s --check-prefix=FASTISEL		; RUN: llc < %s -mtriple=x86_64-apple-darwin9 -mcpu=skx -fast-isel \| FileCheck %s --check-prefix=FASTISEL

define <16 x i1> @test1() {		define <16 x i1> @test1() #0 {
; ALL_X64-LABEL: test1:		; ALL_X64-LABEL: test1:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ALL_X64-NEXT: vxorps %xmm0, %xmm0, %xmm0
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test1:		; KNL_X32-LABEL: test1:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: vxorps %xmm0, %xmm0, %xmm0		; KNL_X32-NEXT: vxorps %xmm0, %xmm0, %xmm0
; KNL_X32-NEXT: retl		; KNL_X32-NEXT: retl
;		;
; FASTISEL-LABEL: test1:		; FASTISEL-LABEL: test1:
; FASTISEL: ## %bb.0:		; FASTISEL: ## %bb.0:
; FASTISEL-NEXT: vxorps %xmm0, %xmm0, %xmm0		; FASTISEL-NEXT: vxorps %xmm0, %xmm0, %xmm0
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
ret <16 x i1> zeroinitializer		ret <16 x i1> zeroinitializer
}		}

define <16 x i1> @test2(<16 x i1>%a, <16 x i1>%b) {		define <16 x i1> @test2(<16 x i1>%a, <16 x i1>%b) #0 {
; ALL_X64-LABEL: test2:		; ALL_X64-LABEL: test2:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0		; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test2:		; KNL_X32-LABEL: test2:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0		; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0
; KNL_X32-NEXT: retl		; KNL_X32-NEXT: retl
;		;
; FASTISEL-LABEL: test2:		; FASTISEL-LABEL: test2:
; FASTISEL: ## %bb.0:		; FASTISEL: ## %bb.0:
; FASTISEL-NEXT: vpsllw $7, %xmm1, %xmm1		; FASTISEL-NEXT: vpsllw $7, %xmm1, %xmm1
; FASTISEL-NEXT: vpmovb2m %xmm1, %k0		; FASTISEL-NEXT: vpmovb2m %xmm1, %k0
; FASTISEL-NEXT: vpsllw $7, %xmm0, %xmm0		; FASTISEL-NEXT: vpsllw $7, %xmm0, %xmm0
; FASTISEL-NEXT: vpmovb2m %xmm0, %k1		; FASTISEL-NEXT: vpmovb2m %xmm0, %k1
; FASTISEL-NEXT: kandw %k0, %k1, %k0		; FASTISEL-NEXT: kandw %k0, %k1, %k0
; FASTISEL-NEXT: vpmovm2b %k0, %xmm0		; FASTISEL-NEXT: vpmovm2b %k0, %xmm0
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = and <16 x i1>%a, %b		%c = and <16 x i1>%a, %b
ret <16 x i1> %c		ret <16 x i1> %c
}		}

define <8 x i1> @test3(<8 x i1>%a, <8 x i1>%b) {		define <8 x i1> @test3(<8 x i1>%a, <8 x i1>%b) #0 {
; ALL_X64-LABEL: test3:		; ALL_X64-LABEL: test3:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0		; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test3:		; KNL_X32-LABEL: test3:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0		; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0
; KNL_X32-NEXT: retl		; KNL_X32-NEXT: retl
;		;
; FASTISEL-LABEL: test3:		; FASTISEL-LABEL: test3:
; FASTISEL: ## %bb.0:		; FASTISEL: ## %bb.0:
; FASTISEL-NEXT: vpsllw $15, %xmm1, %xmm1		; FASTISEL-NEXT: vpsllw $15, %xmm1, %xmm1
; FASTISEL-NEXT: vpmovw2m %xmm1, %k0		; FASTISEL-NEXT: vpmovw2m %xmm1, %k0
; FASTISEL-NEXT: vpsllw $15, %xmm0, %xmm0		; FASTISEL-NEXT: vpsllw $15, %xmm0, %xmm0
; FASTISEL-NEXT: vpmovw2m %xmm0, %k1		; FASTISEL-NEXT: vpmovw2m %xmm0, %k1
; FASTISEL-NEXT: kandb %k0, %k1, %k0		; FASTISEL-NEXT: kandb %k0, %k1, %k0
; FASTISEL-NEXT: vpmovm2w %k0, %xmm0		; FASTISEL-NEXT: vpmovm2w %k0, %xmm0
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = and <8 x i1>%a, %b		%c = and <8 x i1>%a, %b
ret <8 x i1> %c		ret <8 x i1> %c
}		}

define <4 x i1> @test4(<4 x i1>%a, <4 x i1>%b) {		define <4 x i1> @test4(<4 x i1>%a, <4 x i1>%b) #0 {
; ALL_X64-LABEL: test4:		; ALL_X64-LABEL: test4:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0		; ALL_X64-NEXT: vandps %xmm1, %xmm0, %xmm0
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test4:		; KNL_X32-LABEL: test4:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0		; KNL_X32-NEXT: vandps %xmm1, %xmm0, %xmm0
Show All 9 Lines
; FASTISEL-NEXT: vpmovm2d %k0, %xmm0		; FASTISEL-NEXT: vpmovm2d %k0, %xmm0
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = and <4 x i1>%a, %b		%c = and <4 x i1>%a, %b
ret <4 x i1> %c		ret <4 x i1> %c
}		}

declare <8 x i1> @func8xi1(<8 x i1> %a)		declare <8 x i1> @func8xi1(<8 x i1> %a)

define <8 x i32> @test5(<8 x i32>%a, <8 x i32>%b) {		define <8 x i32> @test5(<8 x i32>%a, <8 x i32>%b) #0 {
; KNL-LABEL: test5:		; KNL-LABEL: test5:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rax		; KNL-NEXT: pushq %rax
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0		; KNL-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0
; KNL-NEXT: vpmovdw %zmm0, %ymm0		; KNL-NEXT: vpmovdw %zmm0, %ymm0
; KNL-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0		; KNL-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0
; KNL-NEXT: callq _func8xi1		; KNL-NEXT: callq _func8xi1
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; FASTISEL-NEXT: retq
%cmpRes = icmp sgt <8 x i32>%a, %b		%cmpRes = icmp sgt <8 x i32>%a, %b
%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)		%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)
%res = sext <8 x i1>%resi to <8 x i32>		%res = sext <8 x i1>%resi to <8 x i32>
ret <8 x i32> %res		ret <8 x i32> %res
}		}

declare <16 x i1> @func16xi1(<16 x i1> %a)		declare <16 x i1> @func16xi1(<16 x i1> %a)

define <16 x i32> @test6(<16 x i32>%a, <16 x i32>%b) {		define <16 x i32> @test6(<16 x i32>%a, <16 x i32>%b) #0 {
; KNL-LABEL: test6:		; KNL-LABEL: test6:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rax		; KNL-NEXT: pushq %rax
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: vpcmpgtd %zmm1, %zmm0, %k1		; KNL-NEXT: vpcmpgtd %zmm1, %zmm0, %k1
; KNL-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: vpmovdb %zmm0, %xmm0		; KNL-NEXT: vpmovdb %zmm0, %xmm0
; KNL-NEXT: callq _func16xi1		; KNL-NEXT: callq _func16xi1
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; FASTISEL-NEXT: retq
%cmpRes = icmp sgt <16 x i32>%a, %b		%cmpRes = icmp sgt <16 x i32>%a, %b
%resi = call <16 x i1> @func16xi1(<16 x i1> %cmpRes)		%resi = call <16 x i1> @func16xi1(<16 x i1> %cmpRes)
%res = sext <16 x i1>%resi to <16 x i32>		%res = sext <16 x i1>%resi to <16 x i32>
ret <16 x i32> %res		ret <16 x i32> %res
}		}

declare <4 x i1> @func4xi1(<4 x i1> %a)		declare <4 x i1> @func4xi1(<4 x i1> %a)

define <4 x i32> @test7(<4 x i32>%a, <4 x i32>%b) {		define <4 x i32> @test7(<4 x i32>%a, <4 x i32>%b) #0 {
; ALL_X64-LABEL: test7:		; ALL_X64-LABEL: test7:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: pushq %rax		; ALL_X64-NEXT: pushq %rax
; ALL_X64-NEXT: .cfi_def_cfa_offset 16		; ALL_X64-NEXT: .cfi_def_cfa_offset 16
; ALL_X64-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0		; ALL_X64-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
; ALL_X64-NEXT: callq _func4xi1		; ALL_X64-NEXT: callq _func4xi1
; ALL_X64-NEXT: vpslld $31, %xmm0, %xmm0		; ALL_X64-NEXT: vpslld $31, %xmm0, %xmm0
; ALL_X64-NEXT: vpsrad $31, %xmm0, %xmm0		; ALL_X64-NEXT: vpsrad $31, %xmm0, %xmm0
Show All 24 Lines
; FASTISEL-NEXT: popq %rax		; FASTISEL-NEXT: popq %rax
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%cmpRes = icmp sgt <4 x i32>%a, %b		%cmpRes = icmp sgt <4 x i32>%a, %b
%resi = call <4 x i1> @func4xi1(<4 x i1> %cmpRes)		%resi = call <4 x i1> @func4xi1(<4 x i1> %cmpRes)
%res = sext <4 x i1>%resi to <4 x i32>		%res = sext <4 x i1>%resi to <4 x i32>
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <8 x i1> @test7a(<8 x i32>%a, <8 x i32>%b) {		define <8 x i1> @test7a(<8 x i32>%a, <8 x i32>%b) #0 {
; KNL-LABEL: test7a:		; KNL-LABEL: test7a:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rax		; KNL-NEXT: pushq %rax
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0		; KNL-NEXT: vpcmpgtd %ymm1, %ymm0, %ymm0
; KNL-NEXT: vpmovdw %zmm0, %ymm0		; KNL-NEXT: vpmovdw %zmm0, %ymm0
; KNL-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0		; KNL-NEXT: ## kill: def $xmm0 killed $xmm0 killed $ymm0
; KNL-NEXT: callq _func8xi1		; KNL-NEXT: callq _func8xi1
Show All 37 Lines
; FASTISEL-NEXT: popq %rax		; FASTISEL-NEXT: popq %rax
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%cmpRes = icmp sgt <8 x i32>%a, %b		%cmpRes = icmp sgt <8 x i32>%a, %b
%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)		%resi = call <8 x i1> @func8xi1(<8 x i1> %cmpRes)
%res = and <8 x i1>%resi, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>		%res = and <8 x i1>%resi, <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>
ret <8 x i1> %res		ret <8 x i1> %res
}		}

define <16 x i8> @test8(<16 x i8> %a1, <16 x i8> %a2, i1 %cond) {		define <16 x i8> @test8(<16 x i8> %a1, <16 x i8> %a2, i1 %cond) #0 {
; ALL_X64-LABEL: test8:		; ALL_X64-LABEL: test8:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: testb $1, %dil		; ALL_X64-NEXT: testb $1, %dil
; ALL_X64-NEXT: jne LBB8_2		; ALL_X64-NEXT: jne LBB8_2
; ALL_X64-NEXT: ## %bb.1:		; ALL_X64-NEXT: ## %bb.1:
; ALL_X64-NEXT: vmovaps %xmm1, %xmm0		; ALL_X64-NEXT: vmovaps %xmm1, %xmm0
; ALL_X64-NEXT: LBB8_2:		; ALL_X64-NEXT: LBB8_2:
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
Show All 14 Lines
; FASTISEL-NEXT: ## %bb.1:		; FASTISEL-NEXT: ## %bb.1:
; FASTISEL-NEXT: vmovaps %xmm1, %xmm0		; FASTISEL-NEXT: vmovaps %xmm1, %xmm0
; FASTISEL-NEXT: LBB8_2:		; FASTISEL-NEXT: LBB8_2:
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%res = select i1 %cond, <16 x i8> %a1, <16 x i8> %a2		%res = select i1 %cond, <16 x i8> %a1, <16 x i8> %a2
ret <16 x i8> %res		ret <16 x i8> %res
}		}

define i1 @test9(double %a, double %b) {		define i1 @test9(double %a, double %b) #0 {
; ALL_X64-LABEL: test9:		; ALL_X64-LABEL: test9:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: vucomisd %xmm0, %xmm1		; ALL_X64-NEXT: vucomisd %xmm0, %xmm1
; ALL_X64-NEXT: setb %al		; ALL_X64-NEXT: setb %al
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test9:		; KNL_X32-LABEL: test9:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; KNL_X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; KNL_X32-NEXT: vucomisd {{[0-9]+}}(%esp), %xmm0		; KNL_X32-NEXT: vucomisd {{[0-9]+}}(%esp), %xmm0
; KNL_X32-NEXT: setb %al		; KNL_X32-NEXT: setb %al
; KNL_X32-NEXT: retl		; KNL_X32-NEXT: retl
;		;
; FASTISEL-LABEL: test9:		; FASTISEL-LABEL: test9:
; FASTISEL: ## %bb.0:		; FASTISEL: ## %bb.0:
; FASTISEL-NEXT: vucomisd %xmm0, %xmm1		; FASTISEL-NEXT: vucomisd %xmm0, %xmm1
; FASTISEL-NEXT: setb %al		; FASTISEL-NEXT: setb %al
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = fcmp ugt double %a, %b		%c = fcmp ugt double %a, %b
ret i1 %c		ret i1 %c
}		}

define i32 @test10(i32 %a, i32 %b, i1 %cond) {		define i32 @test10(i32 %a, i32 %b, i1 %cond) #0 {
; ALL_X64-LABEL: test10:		; ALL_X64-LABEL: test10:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: movl %edi, %eax		; ALL_X64-NEXT: movl %edi, %eax
; ALL_X64-NEXT: testb $1, %dl		; ALL_X64-NEXT: testb $1, %dl
; ALL_X64-NEXT: cmovel %esi, %eax		; ALL_X64-NEXT: cmovel %esi, %eax
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test10:		; KNL_X32-LABEL: test10:
Show All 10 Lines
; FASTISEL-NEXT: movl %edi, %eax		; FASTISEL-NEXT: movl %edi, %eax
; FASTISEL-NEXT: testb $1, %dl		; FASTISEL-NEXT: testb $1, %dl
; FASTISEL-NEXT: cmovel %esi, %eax		; FASTISEL-NEXT: cmovel %esi, %eax
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = select i1 %cond, i32 %a, i32 %b		%c = select i1 %cond, i32 %a, i32 %b
ret i32 %c		ret i32 %c
}		}

define i1 @test11(i32 %a, i32 %b) {		define i1 @test11(i32 %a, i32 %b) #0 {
; ALL_X64-LABEL: test11:		; ALL_X64-LABEL: test11:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: cmpl %esi, %edi		; ALL_X64-NEXT: cmpl %esi, %edi
; ALL_X64-NEXT: setg %al		; ALL_X64-NEXT: setg %al
; ALL_X64-NEXT: retq		; ALL_X64-NEXT: retq
;		;
; KNL_X32-LABEL: test11:		; KNL_X32-LABEL: test11:
; KNL_X32: ## %bb.0:		; KNL_X32: ## %bb.0:
; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; KNL_X32-NEXT: cmpl {{[0-9]+}}(%esp), %eax		; KNL_X32-NEXT: cmpl {{[0-9]+}}(%esp), %eax
; KNL_X32-NEXT: setg %al		; KNL_X32-NEXT: setg %al
; KNL_X32-NEXT: retl		; KNL_X32-NEXT: retl
;		;
; FASTISEL-LABEL: test11:		; FASTISEL-LABEL: test11:
; FASTISEL: ## %bb.0:		; FASTISEL: ## %bb.0:
; FASTISEL-NEXT: cmpl %esi, %edi		; FASTISEL-NEXT: cmpl %esi, %edi
; FASTISEL-NEXT: setg %al		; FASTISEL-NEXT: setg %al
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%c = icmp sgt i32 %a, %b		%c = icmp sgt i32 %a, %b
ret i1 %c		ret i1 %c
}		}

define i32 @test12(i32 %a1, i32 %a2, i32 %b1) {		define i32 @test12(i32 %a1, i32 %a2, i32 %b1) #0 {
; ALL_X64-LABEL: test12:		; ALL_X64-LABEL: test12:
; ALL_X64: ## %bb.0:		; ALL_X64: ## %bb.0:
; ALL_X64-NEXT: pushq %rbp		; ALL_X64-NEXT: pushq %rbp
; ALL_X64-NEXT: .cfi_def_cfa_offset 16		; ALL_X64-NEXT: .cfi_def_cfa_offset 16
; ALL_X64-NEXT: pushq %r14		; ALL_X64-NEXT: pushq %r14
; ALL_X64-NEXT: .cfi_def_cfa_offset 24		; ALL_X64-NEXT: .cfi_def_cfa_offset 24
; ALL_X64-NEXT: pushq %rbx		; ALL_X64-NEXT: pushq %rbx
; ALL_X64-NEXT: .cfi_def_cfa_offset 32		; ALL_X64-NEXT: .cfi_def_cfa_offset 32
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: popq %rbp		; FASTISEL-NEXT: popq %rbp
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%cond = call i1 @test11(i32 %a1, i32 %b1)		%cond = call i1 @test11(i32 %a1, i32 %b1)
%res = call i32 @test10(i32 %a1, i32 %a2, i1 %cond)		%res = call i32 @test10(i32 %a1, i32 %a2, i1 %cond)
%res1 = select i1 %cond, i32 %res, i32 0		%res1 = select i1 %cond, i32 %res, i32 0
ret i32 %res1		ret i32 %res1
}		}

define <1 x i1> @test13(ptr %foo) {		define <1 x i1> @test13(ptr %foo) #0 {
; KNL-LABEL: test13:		; KNL-LABEL: test13:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: movzbl (%rdi), %eax		; KNL-NEXT: movzbl (%rdi), %eax
; KNL-NEXT: ## kill: def $al killed $al killed $eax		; KNL-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: test13:		; SKX-LABEL: test13:
; SKX: ## %bb.0:		; SKX: ## %bb.0:
Show All 14 Lines
; FASTISEL-NEXT: kmovb (%rdi), %k0		; FASTISEL-NEXT: kmovb (%rdi), %k0
; FASTISEL-NEXT: kmovd %k0, %eax		; FASTISEL-NEXT: kmovd %k0, %eax
; FASTISEL-NEXT: ## kill: def $al killed $al killed $eax		; FASTISEL-NEXT: ## kill: def $al killed $al killed $eax
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%bar = load <1 x i1>, ptr %foo		%bar = load <1 x i1>, ptr %foo
ret <1 x i1> %bar		ret <1 x i1> %bar
}		}

define void @test14(ptr %x) {		define void @test14(ptr %x) #0 {
; KNL-LABEL: test14:		; KNL-LABEL: test14:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rbx		; KNL-NEXT: pushq %rbx
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: .cfi_offset %rbx, -16		; KNL-NEXT: .cfi_offset %rbx, -16
; KNL-NEXT: movq %rdi, %rbx		; KNL-NEXT: movq %rdi, %rbx
; KNL-NEXT: vmovaps (%rdi), %zmm0		; KNL-NEXT: vmovaps (%rdi), %zmm0
; KNL-NEXT: callq _test14_callee		; KNL-NEXT: callq _test14_callee
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
%a = load <32 x i16>, ptr %x		%a = load <32 x i16>, ptr %x
%b = call <32 x i16> @test14_callee(<32 x i16> %a)		%b = call <32 x i16> @test14_callee(<32 x i16> %a)
store <32 x i16> %b, ptr %x		store <32 x i16> %b, ptr %x
ret void		ret void
}		}
declare <32 x i16> @test14_callee(<32 x i16>)		declare <32 x i16> @test14_callee(<32 x i16>)

define void @test15(ptr %x) {		define void @test15(ptr %x) #0 {
; KNL-LABEL: test15:		; KNL-LABEL: test15:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rbx		; KNL-NEXT: pushq %rbx
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: .cfi_offset %rbx, -16		; KNL-NEXT: .cfi_offset %rbx, -16
; KNL-NEXT: movq %rdi, %rbx		; KNL-NEXT: movq %rdi, %rbx
; KNL-NEXT: vmovaps (%rdi), %zmm0		; KNL-NEXT: vmovaps (%rdi), %zmm0
; KNL-NEXT: callq _test15_callee		; KNL-NEXT: callq _test15_callee
▲ Show 20 Lines • Show All 2,912 Lines • ▼ Show 20 Lines	; FASTISEL-NEXT: retq
%n = and <7 x i1> %m, %f		%n = and <7 x i1> %m, %f
%o = and <7 x i1> %n, %g		%o = and <7 x i1> %n, %g
%p = and <7 x i1> %o, %h		%p = and <7 x i1> %o, %h
%q = and <7 x i1> %p, %i		%q = and <7 x i1> %p, %i
ret <7 x i1> %q		ret <7 x i1> %q
}		}

declare void @v2i1_mem_callee(<128 x i32> %x, <2 x i1> %y)		declare void @v2i1_mem_callee(<128 x i32> %x, <2 x i1> %y)
define void @v2i1_mem(<128 x i32> %x, <2 x i1> %y) {		define void @v2i1_mem(<128 x i32> %x, <2 x i1> %y) #0 {
; KNL-LABEL: v2i1_mem:		; KNL-LABEL: v2i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: subq $24, %rsp		; KNL-NEXT: subq $24, %rsp
; KNL-NEXT: .cfi_def_cfa_offset 32		; KNL-NEXT: .cfi_def_cfa_offset 32
; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8		; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8
; KNL-NEXT: vmovaps %xmm8, (%rsp)		; KNL-NEXT: vmovaps %xmm8, (%rsp)
; KNL-NEXT: callq _v2i1_mem_callee		; KNL-NEXT: callq _v2i1_mem_callee
; KNL-NEXT: addq $24, %rsp		; KNL-NEXT: addq $24, %rsp
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: addq $24, %rsp		; FASTISEL-NEXT: addq $24, %rsp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v2i1_mem_callee(<128 x i32> %x, <2 x i1> %y)		call void @v2i1_mem_callee(<128 x i32> %x, <2 x i1> %y)
ret void		ret void
}		}

declare void @v4i1_mem_callee(<128 x i32> %x, <4 x i1> %y)		declare void @v4i1_mem_callee(<128 x i32> %x, <4 x i1> %y)
define void @v4i1_mem(<128 x i32> %x, <4 x i1> %y) {		define void @v4i1_mem(<128 x i32> %x, <4 x i1> %y) #0 {
; KNL-LABEL: v4i1_mem:		; KNL-LABEL: v4i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: subq $24, %rsp		; KNL-NEXT: subq $24, %rsp
; KNL-NEXT: .cfi_def_cfa_offset 32		; KNL-NEXT: .cfi_def_cfa_offset 32
; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8		; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8
; KNL-NEXT: vmovaps %xmm8, (%rsp)		; KNL-NEXT: vmovaps %xmm8, (%rsp)
; KNL-NEXT: callq _v4i1_mem_callee		; KNL-NEXT: callq _v4i1_mem_callee
; KNL-NEXT: addq $24, %rsp		; KNL-NEXT: addq $24, %rsp
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: addq $24, %rsp		; FASTISEL-NEXT: addq $24, %rsp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v4i1_mem_callee(<128 x i32> %x, <4 x i1> %y)		call void @v4i1_mem_callee(<128 x i32> %x, <4 x i1> %y)
ret void		ret void
}		}

declare void @v8i1_mem_callee(<128 x i32> %x, <8 x i1> %y)		declare void @v8i1_mem_callee(<128 x i32> %x, <8 x i1> %y)
define void @v8i1_mem(<128 x i32> %x, <8 x i1> %y) {		define void @v8i1_mem(<128 x i32> %x, <8 x i1> %y) #0 {
; KNL-LABEL: v8i1_mem:		; KNL-LABEL: v8i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: subq $24, %rsp		; KNL-NEXT: subq $24, %rsp
; KNL-NEXT: .cfi_def_cfa_offset 32		; KNL-NEXT: .cfi_def_cfa_offset 32
; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8		; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8
; KNL-NEXT: vmovaps %xmm8, (%rsp)		; KNL-NEXT: vmovaps %xmm8, (%rsp)
; KNL-NEXT: callq _v8i1_mem_callee		; KNL-NEXT: callq _v8i1_mem_callee
; KNL-NEXT: addq $24, %rsp		; KNL-NEXT: addq $24, %rsp
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: addq $24, %rsp		; FASTISEL-NEXT: addq $24, %rsp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v8i1_mem_callee(<128 x i32> %x, <8 x i1> %y)		call void @v8i1_mem_callee(<128 x i32> %x, <8 x i1> %y)
ret void		ret void
}		}

declare void @v16i1_mem_callee(<128 x i32> %x, <16 x i1> %y)		declare void @v16i1_mem_callee(<128 x i32> %x, <16 x i1> %y)
define void @v16i1_mem(<128 x i32> %x, <16 x i1> %y) {		define void @v16i1_mem(<128 x i32> %x, <16 x i1> %y) #0 {
; KNL-LABEL: v16i1_mem:		; KNL-LABEL: v16i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: subq $24, %rsp		; KNL-NEXT: subq $24, %rsp
; KNL-NEXT: .cfi_def_cfa_offset 32		; KNL-NEXT: .cfi_def_cfa_offset 32
; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8		; KNL-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm8
; KNL-NEXT: vmovaps %xmm8, (%rsp)		; KNL-NEXT: vmovaps %xmm8, (%rsp)
; KNL-NEXT: callq _v16i1_mem_callee		; KNL-NEXT: callq _v16i1_mem_callee
; KNL-NEXT: addq $24, %rsp		; KNL-NEXT: addq $24, %rsp
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: addq $24, %rsp		; FASTISEL-NEXT: addq $24, %rsp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v16i1_mem_callee(<128 x i32> %x, <16 x i1> %y)		call void @v16i1_mem_callee(<128 x i32> %x, <16 x i1> %y)
ret void		ret void
}		}

declare void @v32i1_mem_callee(<128 x i32> %x, <32 x i1> %y)		declare void @v32i1_mem_callee(<128 x i32> %x, <32 x i1> %y)
define void @v32i1_mem(<128 x i32> %x, <32 x i1> %y) {		define void @v32i1_mem(<128 x i32> %x, <32 x i1> %y) #0 {
; KNL-LABEL: v32i1_mem:		; KNL-LABEL: v32i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: pushq %rbp		; KNL-NEXT: pushq %rbp
; KNL-NEXT: .cfi_def_cfa_offset 16		; KNL-NEXT: .cfi_def_cfa_offset 16
; KNL-NEXT: .cfi_offset %rbp, -16		; KNL-NEXT: .cfi_offset %rbp, -16
; KNL-NEXT: movq %rsp, %rbp		; KNL-NEXT: movq %rsp, %rbp
; KNL-NEXT: .cfi_def_cfa_register %rbp		; KNL-NEXT: .cfi_def_cfa_register %rbp
; KNL-NEXT: andq $-32, %rsp		; KNL-NEXT: andq $-32, %rsp
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: popq %rbp		; FASTISEL-NEXT: popq %rbp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v32i1_mem_callee(<128 x i32> %x, <32 x i1> %y)		call void @v32i1_mem_callee(<128 x i32> %x, <32 x i1> %y)
ret void		ret void
}		}

declare void @v64i1_mem_callee(<128 x i32> %x, <64 x i1> %y)		declare void @v64i1_mem_callee(<128 x i32> %x, <64 x i1> %y)
define void @v64i1_mem(<128 x i32> %x, <64 x i1> %y) {		define void @v64i1_mem(<128 x i32> %x, <64 x i1> %y) #0 {
; KNL-LABEL: v64i1_mem:		; KNL-LABEL: v64i1_mem:
; KNL: ## %bb.0:		; KNL: ## %bb.0:
; KNL-NEXT: subq $472, %rsp ## imm = 0x1D8		; KNL-NEXT: subq $472, %rsp ## imm = 0x1D8
; KNL-NEXT: .cfi_def_cfa_offset 480		; KNL-NEXT: .cfi_def_cfa_offset 480
; KNL-NEXT: movl {{[0-9]+}}(%rsp), %eax		; KNL-NEXT: movl {{[0-9]+}}(%rsp), %eax
; KNL-NEXT: movl %eax, {{[0-9]+}}(%rsp)		; KNL-NEXT: movl %eax, {{[0-9]+}}(%rsp)
; KNL-NEXT: movl {{[0-9]+}}(%rsp), %eax		; KNL-NEXT: movl {{[0-9]+}}(%rsp), %eax
; KNL-NEXT: movl %eax, {{[0-9]+}}(%rsp)		; KNL-NEXT: movl %eax, {{[0-9]+}}(%rsp)
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
; FASTISEL-NEXT: callq _v64i1_mem_callee		; FASTISEL-NEXT: callq _v64i1_mem_callee
; FASTISEL-NEXT: movq %rbp, %rsp		; FASTISEL-NEXT: movq %rbp, %rsp
; FASTISEL-NEXT: popq %rbp		; FASTISEL-NEXT: popq %rbp
; FASTISEL-NEXT: vzeroupper		; FASTISEL-NEXT: vzeroupper
; FASTISEL-NEXT: retq		; FASTISEL-NEXT: retq
call void @v64i1_mem_callee(<128 x i32> %x, <64 x i1> %y)		call void @v64i1_mem_callee(<128 x i32> %x, <64 x i1> %y)
ret void		ret void
}		}

		attributes #0 = { "min-legal-vector-width"="512" }

llvm/test/CodeGen/X86/avx512bw-mask-op.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx \| FileCheck %s

	define i32 @mask32(i32 %x) {			define i32 @mask32(i32 %x) #0 {
	; CHECK-LABEL: mask32:			; CHECK-LABEL: mask32:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: notl %eax			; CHECK-NEXT: notl %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i32 %x to <32 x i1>			%m0 = bitcast i32 %x to <32 x i1>
	%m1 = xor <32 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			%m1 = xor <32 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>
	%ret = bitcast <32 x i1> %m1 to i32			%ret = bitcast <32 x i1> %m1 to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i64 @mask64(i64 %x) {			define i64 @mask64(i64 %x) #0 {
	; CHECK-LABEL: mask64:			; CHECK-LABEL: mask64:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: notq %rax			; CHECK-NEXT: notq %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i64 %x to <64 x i1>			%m0 = bitcast i64 %x to <64 x i1>
	%m1 = xor <64 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			%m1 = xor <64 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>
	%ret = bitcast <64 x i1> %m1 to i64			%ret = bitcast <64 x i1> %m1 to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define void @mask32_mem(ptr %ptr) {			define void @mask32_mem(ptr %ptr) #0 {
	; CHECK-LABEL: mask32_mem:			; CHECK-LABEL: mask32_mem:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: kmovd (%rdi), %k0			; CHECK-NEXT: kmovd (%rdi), %k0
	; CHECK-NEXT: knotd %k0, %k0			; CHECK-NEXT: knotd %k0, %k0
	; CHECK-NEXT: kmovd %k0, (%rdi)			; CHECK-NEXT: kmovd %k0, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x = load i32, ptr %ptr, align 4			%x = load i32, ptr %ptr, align 4
	%m0 = bitcast i32 %x to <32 x i1>			%m0 = bitcast i32 %x to <32 x i1>
	%m1 = xor <32 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			%m1 = xor <32 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>
	%ret = bitcast <32 x i1> %m1 to i32			%ret = bitcast <32 x i1> %m1 to i32
	store i32 %ret, ptr %ptr, align 4			store i32 %ret, ptr %ptr, align 4
	ret void			ret void
	}			}

	define void @mask64_mem(ptr %ptr) {			define void @mask64_mem(ptr %ptr) #0 {
	; CHECK-LABEL: mask64_mem:			; CHECK-LABEL: mask64_mem:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: kmovq (%rdi), %k0			; CHECK-NEXT: kmovq (%rdi), %k0
	; CHECK-NEXT: knotq %k0, %k0			; CHECK-NEXT: knotq %k0, %k0
	; CHECK-NEXT: kmovq %k0, (%rdi)			; CHECK-NEXT: kmovq %k0, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%x = load i64, ptr %ptr, align 4			%x = load i64, ptr %ptr, align 4
	%m0 = bitcast i64 %x to <64 x i1>			%m0 = bitcast i64 %x to <64 x i1>
	%m1 = xor <64 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			%m1 = xor <64 x i1> %m0, <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1,
	i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>			i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>
	%ret = bitcast <64 x i1> %m1 to i64			%ret = bitcast <64 x i1> %m1 to i64
	store i64 %ret, ptr %ptr, align 4			store i64 %ret, ptr %ptr, align 4
	ret void			ret void
	}			}

	define i32 @mand32(i32 %x, i32 %y) {			define i32 @mand32(i32 %x, i32 %y) #0 {
	; CHECK-LABEL: mand32:			; CHECK-LABEL: mand32:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: orl %esi, %eax			; CHECK-NEXT: orl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ma = bitcast i32 %x to <32 x i1>			%ma = bitcast i32 %x to <32 x i1>
	%mb = bitcast i32 %y to <32 x i1>			%mb = bitcast i32 %y to <32 x i1>
	%mc = and <32 x i1> %ma, %mb			%mc = and <32 x i1> %ma, %mb
	%md = xor <32 x i1> %ma, %mb			%md = xor <32 x i1> %ma, %mb
	%me = or <32 x i1> %mc, %md			%me = or <32 x i1> %mc, %md
	%ret = bitcast <32 x i1> %me to i32			%ret = bitcast <32 x i1> %me to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @mand32_mem(ptr %x, ptr %y) {			define i32 @mand32_mem(ptr %x, ptr %y) #0 {
	; CHECK-LABEL: mand32_mem:			; CHECK-LABEL: mand32_mem:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: kmovd (%rdi), %k0			; CHECK-NEXT: kmovd (%rdi), %k0
	; CHECK-NEXT: kmovd (%rsi), %k1			; CHECK-NEXT: kmovd (%rsi), %k1
	; CHECK-NEXT: kord %k1, %k0, %k0			; CHECK-NEXT: kord %k1, %k0, %k0
	; CHECK-NEXT: kmovd %k0, %eax			; CHECK-NEXT: kmovd %k0, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ma = load <32 x i1>, ptr %x			%ma = load <32 x i1>, ptr %x
	%mb = load <32 x i1>, ptr %y			%mb = load <32 x i1>, ptr %y
	%mc = and <32 x i1> %ma, %mb			%mc = and <32 x i1> %ma, %mb
	%md = xor <32 x i1> %ma, %mb			%md = xor <32 x i1> %ma, %mb
	%me = or <32 x i1> %mc, %md			%me = or <32 x i1> %mc, %md
	%ret = bitcast <32 x i1> %me to i32			%ret = bitcast <32 x i1> %me to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i64 @mand64(i64 %x, i64 %y) {			define i64 @mand64(i64 %x, i64 %y) #0 {
	; CHECK-LABEL: mand64:			; CHECK-LABEL: mand64:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: orq %rsi, %rax			; CHECK-NEXT: orq %rsi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ma = bitcast i64 %x to <64 x i1>			%ma = bitcast i64 %x to <64 x i1>
	%mb = bitcast i64 %y to <64 x i1>			%mb = bitcast i64 %y to <64 x i1>
	%mc = and <64 x i1> %ma, %mb			%mc = and <64 x i1> %ma, %mb
	%md = xor <64 x i1> %ma, %mb			%md = xor <64 x i1> %ma, %mb
	%me = or <64 x i1> %mc, %md			%me = or <64 x i1> %mc, %md
	%ret = bitcast <64 x i1> %me to i64			%ret = bitcast <64 x i1> %me to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define i64 @mand64_mem(ptr %x, ptr %y) {			define i64 @mand64_mem(ptr %x, ptr %y) #0 {
	; CHECK-LABEL: mand64_mem:			; CHECK-LABEL: mand64_mem:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: kmovq (%rdi), %k0			; CHECK-NEXT: kmovq (%rdi), %k0
	; CHECK-NEXT: kmovq (%rsi), %k1			; CHECK-NEXT: kmovq (%rsi), %k1
	; CHECK-NEXT: korq %k1, %k0, %k0			; CHECK-NEXT: korq %k1, %k0, %k0
	; CHECK-NEXT: kmovq %k0, %rax			; CHECK-NEXT: kmovq %k0, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ma = load <64 x i1>, ptr %x			%ma = load <64 x i1>, ptr %x
	%mb = load <64 x i1>, ptr %y			%mb = load <64 x i1>, ptr %y
	%mc = and <64 x i1> %ma, %mb			%mc = and <64 x i1> %ma, %mb
	%md = xor <64 x i1> %ma, %mb			%md = xor <64 x i1> %ma, %mb
	%me = or <64 x i1> %mc, %md			%me = or <64 x i1> %mc, %md
	%ret = bitcast <64 x i1> %me to i64			%ret = bitcast <64 x i1> %me to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define i32 @test_v32i1_add(i32 %x, i32 %y) {			define i32 @test_v32i1_add(i32 %x, i32 %y) #0 {
	; CHECK-LABEL: test_v32i1_add:			; CHECK-LABEL: test_v32i1_add:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %esi, %eax			; CHECK-NEXT: xorl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i32 %x to <32 x i1>			%m0 = bitcast i32 %x to <32 x i1>
	%m1 = bitcast i32 %y to <32 x i1>			%m1 = bitcast i32 %y to <32 x i1>
	%m2 = add <32 x i1> %m0, %m1			%m2 = add <32 x i1> %m0, %m1
	%ret = bitcast <32 x i1> %m2 to i32			%ret = bitcast <32 x i1> %m2 to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @test_v32i1_sub(i32 %x, i32 %y) {			define i32 @test_v32i1_sub(i32 %x, i32 %y) #0 {
	; CHECK-LABEL: test_v32i1_sub:			; CHECK-LABEL: test_v32i1_sub:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: xorl %esi, %eax			; CHECK-NEXT: xorl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i32 %x to <32 x i1>			%m0 = bitcast i32 %x to <32 x i1>
	%m1 = bitcast i32 %y to <32 x i1>			%m1 = bitcast i32 %y to <32 x i1>
	%m2 = sub <32 x i1> %m0, %m1			%m2 = sub <32 x i1> %m0, %m1
	%ret = bitcast <32 x i1> %m2 to i32			%ret = bitcast <32 x i1> %m2 to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i32 @test_v32i1_mul(i32 %x, i32 %y) {			define i32 @test_v32i1_mul(i32 %x, i32 %y) #0 {
	; CHECK-LABEL: test_v32i1_mul:			; CHECK-LABEL: test_v32i1_mul:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: andl %esi, %eax			; CHECK-NEXT: andl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i32 %x to <32 x i1>			%m0 = bitcast i32 %x to <32 x i1>
	%m1 = bitcast i32 %y to <32 x i1>			%m1 = bitcast i32 %y to <32 x i1>
	%m2 = mul <32 x i1> %m0, %m1			%m2 = mul <32 x i1> %m0, %m1
	%ret = bitcast <32 x i1> %m2 to i32			%ret = bitcast <32 x i1> %m2 to i32
	ret i32 %ret			ret i32 %ret
	}			}

	define i64 @test_v64i1_add(i64 %x, i64 %y) {			define i64 @test_v64i1_add(i64 %x, i64 %y) #0 {
	; CHECK-LABEL: test_v64i1_add:			; CHECK-LABEL: test_v64i1_add:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: xorq %rsi, %rax			; CHECK-NEXT: xorq %rsi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i64 %x to <64 x i1>			%m0 = bitcast i64 %x to <64 x i1>
	%m1 = bitcast i64 %y to <64 x i1>			%m1 = bitcast i64 %y to <64 x i1>
	%m2 = add <64 x i1> %m0, %m1			%m2 = add <64 x i1> %m0, %m1
	%ret = bitcast <64 x i1> %m2 to i64			%ret = bitcast <64 x i1> %m2 to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define i64 @test_v64i1_sub(i64 %x, i64 %y) {			define i64 @test_v64i1_sub(i64 %x, i64 %y) #0 {
	; CHECK-LABEL: test_v64i1_sub:			; CHECK-LABEL: test_v64i1_sub:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: xorq %rsi, %rax			; CHECK-NEXT: xorq %rsi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i64 %x to <64 x i1>			%m0 = bitcast i64 %x to <64 x i1>
	%m1 = bitcast i64 %y to <64 x i1>			%m1 = bitcast i64 %y to <64 x i1>
	%m2 = sub <64 x i1> %m0, %m1			%m2 = sub <64 x i1> %m0, %m1
	%ret = bitcast <64 x i1> %m2 to i64			%ret = bitcast <64 x i1> %m2 to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define i64 @test_v64i1_mul(i64 %x, i64 %y) {			define i64 @test_v64i1_mul(i64 %x, i64 %y) #0 {
	; CHECK-LABEL: test_v64i1_mul:			; CHECK-LABEL: test_v64i1_mul:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: andq %rsi, %rax			; CHECK-NEXT: andq %rsi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m0 = bitcast i64 %x to <64 x i1>			%m0 = bitcast i64 %x to <64 x i1>
	%m1 = bitcast i64 %y to <64 x i1>			%m1 = bitcast i64 %y to <64 x i1>
	%m2 = mul <64 x i1> %m0, %m1			%m2 = mul <64 x i1> %m0, %m1
	%ret = bitcast <64 x i1> %m2 to i64			%ret = bitcast <64 x i1> %m2 to i64
	ret i64 %ret			ret i64 %ret
	}			}

	define <32 x i1> @bitcast_f32_to_v32i1(float %x) {			define <32 x i1> @bitcast_f32_to_v32i1(float %x) #0 {
	; CHECK-LABEL: bitcast_f32_to_v32i1:			; CHECK-LABEL: bitcast_f32_to_v32i1:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmovd %xmm0, %eax			; CHECK-NEXT: vmovd %xmm0, %eax
	; CHECK-NEXT: kmovd %eax, %k0			; CHECK-NEXT: kmovd %eax, %k0
	; CHECK-NEXT: vpmovm2b %k0, %ymm0			; CHECK-NEXT: vpmovm2b %k0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = bitcast float %x to <32 x i1>			%a = bitcast float %x to <32 x i1>
	ret <32 x i1> %a			ret <32 x i1> %a
	}			}

	define <64 x i1> @bitcast_f64_to_v64i1(double %x) {			define <64 x i1> @bitcast_f64_to_v64i1(double %x) #0 {
	; CHECK-LABEL: bitcast_f64_to_v64i1:			; CHECK-LABEL: bitcast_f64_to_v64i1:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmovq %xmm0, %rax			; CHECK-NEXT: vmovq %xmm0, %rax
	; CHECK-NEXT: kmovq %rax, %k0			; CHECK-NEXT: kmovq %rax, %k0
	; CHECK-NEXT: vpmovm2b %k0, %zmm0			; CHECK-NEXT: vpmovm2b %k0, %zmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = bitcast double %x to <64 x i1>			%a = bitcast double %x to <64 x i1>
	ret <64 x i1> %a			ret <64 x i1> %a
	}			}

	define float @bitcast_v32i1_to_f32(<32 x i1> %x) {			define float @bitcast_v32i1_to_f32(<32 x i1> %x) #0 {
	; CHECK-LABEL: bitcast_v32i1_to_f32:			; CHECK-LABEL: bitcast_v32i1_to_f32:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vpsllw $7, %ymm0, %ymm0			; CHECK-NEXT: vpsllw $7, %ymm0, %ymm0
	; CHECK-NEXT: vpmovmskb %ymm0, %eax			; CHECK-NEXT: vpmovmskb %ymm0, %eax
	; CHECK-NEXT: vmovd %eax, %xmm0			; CHECK-NEXT: vmovd %eax, %xmm0
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = bitcast <32 x i1> %x to float			%a = bitcast <32 x i1> %x to float
	ret float %a			ret float %a
	}			}

	define double @bitcast_v64i1_to_f64(<64 x i1> %x) {			define double @bitcast_v64i1_to_f64(<64 x i1> %x) #0 {
	; CHECK-LABEL: bitcast_v64i1_to_f64:			; CHECK-LABEL: bitcast_v64i1_to_f64:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vpsllw $7, %zmm0, %zmm0			; CHECK-NEXT: vpsllw $7, %zmm0, %zmm0
	; CHECK-NEXT: vpmovb2m %zmm0, %k0			; CHECK-NEXT: vpmovb2m %zmm0, %k0
	; CHECK-NEXT: kmovq %k0, %rax			; CHECK-NEXT: kmovq %k0, %rax
	; CHECK-NEXT: vmovq %rax, %xmm0			; CHECK-NEXT: vmovq %rax, %xmm0
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = bitcast <64 x i1> %x to double			%a = bitcast <64 x i1> %x to double
	ret double %a			ret double %a
	}			}


				attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/avx512fp16-subv-broadcast-fp16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx -mattr=+avx512fp16 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=skx -mattr=+avx512fp16 \| FileCheck %s

	define dso_local void @test_v8f16_v32f16(ptr %x_addr, ptr %y_addr) {			define dso_local void @test_v8f16_v32f16(ptr %x_addr, ptr %y_addr) #0 {
	; CHECK-LABEL: test_v8f16_v32f16:			; CHECK-LABEL: test_v8f16_v32f16:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: vbroadcastf32x4 {{.*#+}} zmm0 = mem[0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3]			; CHECK-NEXT: vbroadcastf32x4 {{.*#+}} zmm0 = mem[0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3]
	; CHECK-NEXT: vmovdqa64 %zmm0, (%rsi)			; CHECK-NEXT: vmovdqa64 %zmm0, (%rsi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load <8 x half>, ptr %x_addr, align 16			%0 = load <8 x half>, ptr %x_addr, align 16
	%shuffle.i58 = shufflevector <8 x half> %0, <8 x half> %0, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle.i58 = shufflevector <8 x half> %0, <8 x half> %0, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	store <32 x half> %shuffle.i58, ptr %y_addr, align 64			store <32 x half> %shuffle.i58, ptr %y_addr, align 64
	ret void			ret void
	}			}

	define dso_local void @test_v8f16_v16f16(ptr %x_addr, ptr %y_addr) {			define dso_local void @test_v8f16_v16f16(ptr %x_addr, ptr %y_addr) #0 {
	; CHECK-LABEL: test_v8f16_v16f16:			; CHECK-LABEL: test_v8f16_v16f16:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]			; CHECK-NEXT: vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]
	; CHECK-NEXT: vmovdqa %ymm0, (%rsi)			; CHECK-NEXT: vmovdqa %ymm0, (%rsi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load <8 x half>, ptr %x_addr, align 16			%0 = load <8 x half>, ptr %x_addr, align 16
	%shuffle.i58 = shufflevector <8 x half> %0, <8 x half> %0, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle.i58 = shufflevector <8 x half> %0, <8 x half> %0, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	store <16 x half> %shuffle.i58, ptr %y_addr, align 64			store <16 x half> %shuffle.i58, ptr %y_addr, align 64
	ret void			ret void
	}			}

	define dso_local void @test_v16f16_v32f16(ptr %x_addr, ptr %y_addr) {			define dso_local void @test_v16f16_v32f16(ptr %x_addr, ptr %y_addr) #0 {
	; CHECK-LABEL: test_v16f16_v32f16:			; CHECK-LABEL: test_v16f16_v32f16:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: vbroadcastf64x4 {{.*#+}} zmm0 = mem[0,1,2,3,0,1,2,3]			; CHECK-NEXT: vbroadcastf64x4 {{.*#+}} zmm0 = mem[0,1,2,3,0,1,2,3]
	; CHECK-NEXT: vmovdqa64 %zmm0, (%rsi)			; CHECK-NEXT: vmovdqa64 %zmm0, (%rsi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = load <16 x half>, ptr %x_addr, align 16			%0 = load <16 x half>, ptr %x_addr, align 16
	%shuffle.i58 = shufflevector <16 x half> %0, <16 x half> %0, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%shuffle.i58 = shufflevector <16 x half> %0, <16 x half> %0, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	store <32 x half> %shuffle.i58, ptr %y_addr, align 64			store <32 x half> %shuffle.i58, ptr %y_addr, align 64
	ret void			ret void
	}			}

				attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/perm.avx512-false-deps.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -verify-machineinstrs -mcpu=sapphirerapids -mattr=+false-deps-perm -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s --check-prefixes=ENABLE		; RUN: llc -verify-machineinstrs -mcpu=sapphirerapids -mattr=+false-deps-perm -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s --check-prefixes=ENABLE
; RUN: llc -verify-machineinstrs -mcpu=sapphirerapids -mattr=-false-deps-perm -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s --check-prefixes=DISABLE		; RUN: llc -verify-machineinstrs -mcpu=sapphirerapids -mattr=-false-deps-perm -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s --check-prefixes=DISABLE

define <4 x i64> @permq_ri_256(<4 x i64> %a0) {		define <4 x i64> @permq_ri_256(<4 x i64> %a0) #0 {
; ENABLE-LABEL: permq_ri_256:		; ENABLE-LABEL: permq_ri_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermq {{.*#+}} ymm1 = ymm0[1,2,1,0]		; ENABLE-NEXT: vpermq {{.*#+}} ymm1 = ymm0[1,2,1,0]
; ENABLE-NEXT: vpaddq %ymm0, %ymm1, %ymm0		; ENABLE-NEXT: vpaddq %ymm0, %ymm1, %ymm0
; ENABLE-NEXT: retq		; ENABLE-NEXT: retq
;		;
; DISABLE-LABEL: permq_ri_256:		; DISABLE-LABEL: permq_ri_256:
; DISABLE: # %bb.0:		; DISABLE: # %bb.0:
; DISABLE-NEXT: #APP		; DISABLE-NEXT: #APP
; DISABLE-NEXT: nop		; DISABLE-NEXT: nop
; DISABLE-NEXT: #NO_APP		; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: vpermq {{.*#+}} ymm1 = ymm0[1,2,1,0]		; DISABLE-NEXT: vpermq {{.*#+}} ymm1 = ymm0[1,2,1,0]
; DISABLE-NEXT: vpaddq %ymm0, %ymm1, %ymm0		; DISABLE-NEXT: vpaddq %ymm0, %ymm1, %ymm0
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 1, i32 2, i32 1, i32 0>		%2 = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 1, i32 2, i32 1, i32 0>
%res = add <4 x i64> %2, %a0		%res = add <4 x i64> %2, %a0
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @permq_rr_256(<4 x i64> %a0, <4 x i64> %idx) {		define <4 x i64> @permq_rr_256(<4 x i64> %a0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permq_rr_256:		; ENABLE-LABEL: permq_rr_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload		; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
Show All 15 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)		%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)
%t = add <4 x i64> %a0, %idx		%t = add <4 x i64> %a0, %idx
%res = add <4 x i64> %t, %2		%res = add <4 x i64> %t, %2
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @permq_rm_256(ptr %p0, <4 x i64> %idx) {		define <4 x i64> @permq_rm_256(ptr %p0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permq_rm_256:		; ENABLE-LABEL: permq_rm_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermq (%rdi), %ymm0, %ymm1		; ENABLE-NEXT: vpermq (%rdi), %ymm0, %ymm1
; ENABLE-NEXT: vpaddq %ymm1, %ymm0, %ymm0		; ENABLE-NEXT: vpaddq %ymm1, %ymm0, %ymm0
Show All 9 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <4 x i64>, ptr %p0, align 64		%a0 = load <4 x i64>, ptr %p0, align 64
%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)		%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)
%res = add <4 x i64> %idx, %2		%res = add <4 x i64> %idx, %2
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @permq_mi_256(ptr %p0) {		define <4 x i64> @permq_mi_256(ptr %p0) #0 {
; ENABLE-LABEL: permq_mi_256:		; ENABLE-LABEL: permq_mi_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
; ENABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]		; ENABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]
; ENABLE-NEXT: retq		; ENABLE-NEXT: retq
;		;
; DISABLE-LABEL: permq_mi_256:		; DISABLE-LABEL: permq_mi_256:
; DISABLE: # %bb.0:		; DISABLE: # %bb.0:
; DISABLE-NEXT: #APP		; DISABLE-NEXT: #APP
; DISABLE-NEXT: nop		; DISABLE-NEXT: nop
; DISABLE-NEXT: #NO_APP		; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]		; DISABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <4 x i64>, ptr %p0, align 64		%a0 = load <4 x i64>, ptr %p0, align 64
%2 = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 3, i32 2, i32 2, i32 0>		%2 = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 3, i32 2, i32 2, i32 0>
ret <4 x i64> %2		ret <4 x i64> %2
}		}

define <4 x i64> @permq_broadcast_256(ptr %p0, <4 x i64> %idx) {		define <4 x i64> @permq_broadcast_256(ptr %p0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permq_broadcast_256:		; ENABLE-LABEL: permq_broadcast_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 15 Lines	; DISABLE-NEXT: retq
%v0 = load i64, ptr %p0, align 4		%v0 = load i64, ptr %p0, align 4
%t0 = insertelement <4 x i64> undef, i64 %v0, i64 0		%t0 = insertelement <4 x i64> undef, i64 %v0, i64 0
%a0 = shufflevector <4 x i64> %t0, <4 x i64> undef, <4 x i32> zeroinitializer		%a0 = shufflevector <4 x i64> %t0, <4 x i64> undef, <4 x i32> zeroinitializer
%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)		%2 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %idx)
%res = add <4 x i64> %2, %idx		%res = add <4 x i64> %2, %idx
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @permq_maskz_256(<4 x i64> %a0, <4 x i64> %idx, ptr %mask) {		define <4 x i64> @permq_maskz_256(<4 x i64> %a0, <4 x i64> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permq_maskz_256:		; ENABLE-LABEL: permq_maskz_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermq %ymm0, %ymm1, %ymm2		; ENABLE-NEXT: vpermq %ymm0, %ymm1, %ymm2
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
Show All 17 Lines	; DISABLE-NEXT: retq
%t = add <4 x i64> %a0, %idx		%t = add <4 x i64> %a0, %idx
%res = add <4 x i64> %3, %t		%res = add <4 x i64> %3, %t
ret <4 x i64> %res		ret <4 x i64> %res
}		}

declare <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64>, <4 x i64>)		declare <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64>, <4 x i64>)
declare <4 x i64> @llvm.x86.avx512.mask.permvar.di.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)		declare <4 x i64> @llvm.x86.avx512.mask.permvar.di.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)

define <8 x i64> @permq_rr_512(<8 x i64> %a0, <8 x i64> %idx) {		define <8 x i64> @permq_rr_512(<8 x i64> %a0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permq_rr_512:		; ENABLE-LABEL: permq_rr_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm2 # 64-byte Reload		; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm2 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1
Show All 15 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)		%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)
%t = add <8 x i64> %a0, %idx		%t = add <8 x i64> %a0, %idx
%res = add <8 x i64> %t, %2		%res = add <8 x i64> %t, %2
ret <8 x i64> %res		ret <8 x i64> %res
}		}

define <8 x i64> @permq_rm_512(ptr %p0, <8 x i64> %idx) {		define <8 x i64> @permq_rm_512(ptr %p0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permq_rm_512:		; ENABLE-LABEL: permq_rm_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermq (%rdi), %zmm0, %zmm1		; ENABLE-NEXT: vpermq (%rdi), %zmm0, %zmm1
; ENABLE-NEXT: vpaddq %zmm1, %zmm0, %zmm0		; ENABLE-NEXT: vpaddq %zmm1, %zmm0, %zmm0
Show All 9 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <8 x i64>, ptr %p0, align 64		%a0 = load <8 x i64>, ptr %p0, align 64
%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)		%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)
%res = add <8 x i64> %idx, %2		%res = add <8 x i64> %idx, %2
ret <8 x i64> %res		ret <8 x i64> %res
}		}

define <8 x i64> @permq_broadcast_512(ptr %p0, <8 x i64> %idx) {		define <8 x i64> @permq_broadcast_512(ptr %p0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permq_broadcast_512:		; ENABLE-LABEL: permq_broadcast_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 15 Lines	; DISABLE-NEXT: retq
%v0 = load i64, ptr %p0, align 4		%v0 = load i64, ptr %p0, align 4
%t0 = insertelement <8 x i64> undef, i64 %v0, i64 0		%t0 = insertelement <8 x i64> undef, i64 %v0, i64 0
%a0 = shufflevector <8 x i64> %t0, <8 x i64> undef, <8 x i32> zeroinitializer		%a0 = shufflevector <8 x i64> %t0, <8 x i64> undef, <8 x i32> zeroinitializer
%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)		%2 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %idx)
%res = add <8 x i64> %2, %idx		%res = add <8 x i64> %2, %idx
ret <8 x i64> %res		ret <8 x i64> %res
}		}

define <8 x i64> @permq_maskz_512(<8 x i64> %a0, <8 x i64> %idx, ptr %mask) {		define <8 x i64> @permq_maskz_512(<8 x i64> %a0, <8 x i64> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permq_maskz_512:		; ENABLE-LABEL: permq_maskz_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermq %zmm0, %zmm1, %zmm2		; ENABLE-NEXT: vpermq %zmm0, %zmm1, %zmm2
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
Show All 17 Lines	; DISABLE-NEXT: retq
%t = add <8 x i64> %a0, %idx		%t = add <8 x i64> %a0, %idx
%res = add <8 x i64> %3, %t		%res = add <8 x i64> %3, %t
ret <8 x i64> %res		ret <8 x i64> %res
}		}

declare <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64>, <8 x i64>)		declare <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64>, <8 x i64>)
declare <8 x i64> @llvm.x86.avx512.mask.permvar.di.512(<8 x i64>, <8 x i64>, <8 x i64>, i8)		declare <8 x i64> @llvm.x86.avx512.mask.permvar.di.512(<8 x i64>, <8 x i64>, <8 x i64>, i8)

define <8 x i32> @permd_rr_256(<8 x i32> %a0, <8 x i32> %idx) {		define <8 x i32> @permd_rr_256(<8 x i32> %a0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permd_rr_256:		; ENABLE-LABEL: permd_rr_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload		; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm2 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
Show All 15 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> undef, i8 -1)		%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> undef, i8 -1)
%t = add <8 x i32> %a0, %idx		%t = add <8 x i32> %a0, %idx
%res = add <8 x i32> %t, %2		%res = add <8 x i32> %t, %2
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <8 x i32> @permd_rm_256(ptr %p0, <8 x i32> %idx) {		define <8 x i32> @permd_rm_256(ptr %p0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permd_rm_256:		; ENABLE-LABEL: permd_rm_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermd (%rdi), %ymm0, %ymm1		; ENABLE-NEXT: vpermd (%rdi), %ymm0, %ymm1
; ENABLE-NEXT: vpaddd %ymm1, %ymm0, %ymm0		; ENABLE-NEXT: vpaddd %ymm1, %ymm0, %ymm0
Show All 9 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <8 x i32>, ptr %p0, align 64		%a0 = load <8 x i32>, ptr %p0, align 64
%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> undef, i8 -1)		%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> undef, i8 -1)
%res = add <8 x i32> %idx, %2		%res = add <8 x i32> %idx, %2
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <8 x i32> @permd_broadcast_256(ptr %p0, <8 x i32> %idx) {		define <8 x i32> @permd_broadcast_256(ptr %p0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permd_broadcast_256:		; ENABLE-LABEL: permd_broadcast_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovdqu {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 15 Lines	; DISABLE-NEXT: retq
%v0 = load i32, ptr %p0, align 4		%v0 = load i32, ptr %p0, align 4
%t0 = insertelement <8 x i32> undef, i32 %v0, i32 0		%t0 = insertelement <8 x i32> undef, i32 %v0, i32 0
%a0 = shufflevector <8 x i32> %t0, <8 x i32> undef, <8 x i32> zeroinitializer		%a0 = shufflevector <8 x i32> %t0, <8 x i32> undef, <8 x i32> zeroinitializer
%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> zeroinitializer, i8 -1)		%2 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> zeroinitializer, i8 -1)
%res = add <8 x i32> %2, %idx		%res = add <8 x i32> %2, %idx
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <8 x i32> @permd_maskz_256(<8 x i32> %a0, <8 x i32> %idx, ptr %mask) {		define <8 x i32> @permd_maskz_256(<8 x i32> %a0, <8 x i32> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permd_maskz_256:		; ENABLE-LABEL: permd_maskz_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermd %ymm0, %ymm1, %ymm2		; ENABLE-NEXT: vpermd %ymm0, %ymm1, %ymm2
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
Show All 16 Lines	; DISABLE-NEXT: retq
%3 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> zeroinitializer, i8 %2)		%3 = call <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32> %a0, <8 x i32> %idx, <8 x i32> zeroinitializer, i8 %2)
%t = add <8 x i32> %a0, %idx		%t = add <8 x i32> %a0, %idx
%res = add <8 x i32> %3, %t		%res = add <8 x i32> %3, %t
ret <8 x i32> %res		ret <8 x i32> %res
}		}

declare <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32>, <8 x i32>, <8 x i32>, i8)		declare <8 x i32> @llvm.x86.avx512.mask.permvar.si.256(<8 x i32>, <8 x i32>, <8 x i32>, i8)

define <16 x i32> @permd_rr_512(<16 x i32> %a0, <16 x i32> %idx) {		define <16 x i32> @permd_rr_512(<16 x i32> %a0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permd_rr_512:		; ENABLE-LABEL: permd_rr_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm2 # 64-byte Reload		; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm2 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1
Show All 15 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)		%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)
%t = add <16 x i32> %a0, %idx		%t = add <16 x i32> %a0, %idx
%res = add <16 x i32> %t, %2		%res = add <16 x i32> %t, %2
ret <16 x i32> %res		ret <16 x i32> %res
}		}

define <16 x i32> @permd_rm_512(ptr %p0, <16 x i32> %idx) {		define <16 x i32> @permd_rm_512(ptr %p0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permd_rm_512:		; ENABLE-LABEL: permd_rm_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vpxor %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermd (%rdi), %zmm0, %zmm1		; ENABLE-NEXT: vpermd (%rdi), %zmm0, %zmm1
; ENABLE-NEXT: vpaddd %zmm1, %zmm0, %zmm0		; ENABLE-NEXT: vpaddd %zmm1, %zmm0, %zmm0
Show All 9 Lines
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <16 x i32>, ptr %p0, align 64		%a0 = load <16 x i32>, ptr %p0, align 64
%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)		%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)
%res = add <16 x i32> %idx, %2		%res = add <16 x i32> %idx, %2
ret <16 x i32> %res		ret <16 x i32> %res
}		}

define <16 x i32> @permd_broadcast_512(ptr %p0, <16 x i32> %idx) {		define <16 x i32> @permd_broadcast_512(ptr %p0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permd_broadcast_512:		; ENABLE-LABEL: permd_broadcast_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovdqu64 {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 15 Lines	; DISABLE-NEXT: retq
%v0 = load i32, ptr %p0, align 4		%v0 = load i32, ptr %p0, align 4
%t0 = insertelement <16 x i32> undef, i32 %v0, i32 0		%t0 = insertelement <16 x i32> undef, i32 %v0, i32 0
%a0 = shufflevector <16 x i32> %t0, <16 x i32> undef, <16 x i32> zeroinitializer		%a0 = shufflevector <16 x i32> %t0, <16 x i32> undef, <16 x i32> zeroinitializer
%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)		%2 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> undef, i16 -1)
%res = add <16 x i32> %2, %idx		%res = add <16 x i32> %2, %idx
ret <16 x i32> %res		ret <16 x i32> %res
}		}

define <16 x i32> @permd_maskz_512(<16 x i32> %a0, <16 x i32> %idx, ptr %mask) {		define <16 x i32> @permd_maskz_512(<16 x i32> %a0, <16 x i32> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permd_maskz_512:		; ENABLE-LABEL: permd_maskz_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermd %zmm0, %zmm1, %zmm2		; ENABLE-NEXT: vpermd %zmm0, %zmm1, %zmm2
; ENABLE-NEXT: kmovw (%rdi), %k1		; ENABLE-NEXT: kmovw (%rdi), %k1
Show All 16 Lines	; DISABLE-NEXT: retq
%3 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> zeroinitializer, i16 %2)		%3 = call <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32> %a0, <16 x i32> %idx, <16 x i32> zeroinitializer, i16 %2)
%t = add <16 x i32> %a0, %idx		%t = add <16 x i32> %a0, %idx
%res = add <16 x i32> %3, %t		%res = add <16 x i32> %3, %t
ret <16 x i32> %res		ret <16 x i32> %res
}		}

declare <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32>, <16 x i32>, <16 x i32>, i16)		declare <16 x i32> @llvm.x86.avx512.mask.permvar.si.512(<16 x i32>, <16 x i32>, <16 x i32>, i16)

define <4 x double> @permpd_ri_256(<4 x double> %a0) {		define <4 x double> @permpd_ri_256(<4 x double> %a0) #0 {
; ENABLE-LABEL: permpd_ri_256:		; ENABLE-LABEL: permpd_ri_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ENABLE-NEXT: vxorps %xmm1, %xmm1, %xmm1
; ENABLE-NEXT: vpermpd {{.*#+}} ymm1 = ymm0[1,2,1,0]		; ENABLE-NEXT: vpermpd {{.*#+}} ymm1 = ymm0[1,2,1,0]
; ENABLE-NEXT: vaddpd %ymm0, %ymm1, %ymm0		; ENABLE-NEXT: vaddpd %ymm0, %ymm1, %ymm0
; ENABLE-NEXT: retq		; ENABLE-NEXT: retq
;		;
; DISABLE-LABEL: permpd_ri_256:		; DISABLE-LABEL: permpd_ri_256:
; DISABLE: # %bb.0:		; DISABLE: # %bb.0:
; DISABLE-NEXT: #APP		; DISABLE-NEXT: #APP
; DISABLE-NEXT: nop		; DISABLE-NEXT: nop
; DISABLE-NEXT: #NO_APP		; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: vpermpd {{.*#+}} ymm1 = ymm0[1,2,1,0]		; DISABLE-NEXT: vpermpd {{.*#+}} ymm1 = ymm0[1,2,1,0]
; DISABLE-NEXT: vaddpd %ymm0, %ymm1, %ymm0		; DISABLE-NEXT: vaddpd %ymm0, %ymm1, %ymm0
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 1, i32 2, i32 1, i32 0>		%2 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 1, i32 2, i32 1, i32 0>
%res = fadd <4 x double> %2, %a0		%res = fadd <4 x double> %2, %a0
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @permpd_rr_256(<4 x double> %a0, <4 x i64> %idx) {		define <4 x double> @permpd_rr_256(<4 x double> %a0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_rr_256:		; ENABLE-LABEL: permpd_rr_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovapd %ymm0, %ymm2		; ENABLE-NEXT: vmovapd %ymm0, %ymm2
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload
Show All 20 Lines	; DISABLE-NEXT: retq
%1 = tail call <4 x double> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <4 x double> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %1, <4 x i64> %idx)		%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %1, <4 x i64> %idx)
%a1 = sitofp <4 x i64> %idx to <4 x double>		%a1 = sitofp <4 x i64> %idx to <4 x double>
%t = fadd <4 x double> %1, %a1		%t = fadd <4 x double> %1, %a1
%res = fadd <4 x double> %2, %t		%res = fadd <4 x double> %2, %t
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @permpd_rm_256(ptr %p0, <4 x i64> %idx) {		define <4 x double> @permpd_rm_256(ptr %p0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_rm_256:		; ENABLE-LABEL: permpd_rm_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 16 Lines	; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <4 x double>, ptr %p0, align 64		%a0 = load <4 x double>, ptr %p0, align 64
%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> %idx)		%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> %idx)
%a1 = sitofp <4 x i64> %idx to <4 x double>		%a1 = sitofp <4 x i64> %idx to <4 x double>
%res = fadd <4 x double> %2, %a1		%res = fadd <4 x double> %2, %a1
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @permpd_mi_256(ptr %p0) {		define <4 x double> @permpd_mi_256(ptr %p0) #0 {
; ENABLE-LABEL: permpd_mi_256:		; ENABLE-LABEL: permpd_mi_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
; ENABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]		; ENABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]
; ENABLE-NEXT: retq		; ENABLE-NEXT: retq
;		;
; DISABLE-LABEL: permpd_mi_256:		; DISABLE-LABEL: permpd_mi_256:
; DISABLE: # %bb.0:		; DISABLE: # %bb.0:
; DISABLE-NEXT: #APP		; DISABLE-NEXT: #APP
; DISABLE-NEXT: nop		; DISABLE-NEXT: nop
; DISABLE-NEXT: #NO_APP		; DISABLE-NEXT: #NO_APP
; DISABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]		; DISABLE-NEXT: vpermpd {{.*#+}} ymm0 = mem[3,2,2,0]
; DISABLE-NEXT: retq		; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <4 x double>, ptr %p0, align 64		%a0 = load <4 x double>, ptr %p0, align 64
%2 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 3, i32 2, i32 2, i32 0>		%2 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 3, i32 2, i32 2, i32 0>
ret <4 x double> %2		ret <4 x double> %2
}		}

define <4 x double> @permpd_broadcast_256(ptr %p0, <4 x i64> %idx) {		define <4 x double> @permpd_broadcast_256(ptr %p0, <4 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_broadcast_256:		; ENABLE-LABEL: permpd_broadcast_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 18 Lines	; DISABLE-NEXT: retq
%t0 = insertelement <4 x double> undef, double %v0, i64 0		%t0 = insertelement <4 x double> undef, double %v0, i64 0
%a0 = shufflevector <4 x double> %t0, <4 x double> undef, <4 x i32> zeroinitializer		%a0 = shufflevector <4 x double> %t0, <4 x double> undef, <4 x i32> zeroinitializer
%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> %idx)		%2 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> %idx)
%a1 = sitofp <4 x i64> %idx to <4 x double>		%a1 = sitofp <4 x i64> %idx to <4 x double>
%res = fadd <4 x double> %2, %a1		%res = fadd <4 x double> %2, %a1
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @permpd_maskz_256(<4 x double> %a0, <4 x i64> %idx, ptr %mask) {		define <4 x double> @permpd_maskz_256(<4 x double> %a0, <4 x i64> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permpd_maskz_256:		; ENABLE-LABEL: permpd_maskz_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermpd %ymm0, %ymm1, %ymm2 {%k1} {z}		; ENABLE-NEXT: vpermpd %ymm0, %ymm1, %ymm2 {%k1} {z}
Show All 20 Lines	; DISABLE-NEXT: retq
%t = fadd <4 x double> %a0, %a1		%t = fadd <4 x double> %a0, %a1
%res = fadd <4 x double> %3, %t		%res = fadd <4 x double> %3, %t
ret <4 x double> %res		ret <4 x double> %res
}		}

declare <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double>, <4 x i64>)		declare <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double>, <4 x i64>)
declare <4 x double> @llvm.x86.avx512.mask.permvar.df.256(<4 x double>, <4 x i64>, <4 x double>, i8)		declare <4 x double> @llvm.x86.avx512.mask.permvar.df.256(<4 x double>, <4 x i64>, <4 x double>, i8)

define <8 x double> @permpd_rr_512(<8 x double> %a0, <8 x i64> %idx) {		define <8 x double> @permpd_rr_512(<8 x double> %a0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_rr_512:		; ENABLE-LABEL: permpd_rr_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovapd %zmm0, %zmm2		; ENABLE-NEXT: vmovapd %zmm0, %zmm2
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
Show All 20 Lines	; DISABLE-NEXT: retq
%1 = tail call <8 x double> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x double> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %1, <8 x i64> %idx)		%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %1, <8 x i64> %idx)
%a1 = sitofp <8 x i64> %idx to <8 x double>		%a1 = sitofp <8 x i64> %idx to <8 x double>
%t = fadd <8 x double> %1, %a1		%t = fadd <8 x double> %1, %a1
%res = fadd <8 x double> %2, %t		%res = fadd <8 x double> %2, %t
ret <8 x double> %res		ret <8 x double> %res
}		}

define <8 x double> @permpd_rm_512(ptr %p0, <8 x i64> %idx) {		define <8 x double> @permpd_rm_512(ptr %p0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_rm_512:		; ENABLE-LABEL: permpd_rm_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 16 Lines	; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <8 x double>, ptr %p0, align 64		%a0 = load <8 x double>, ptr %p0, align 64
%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> %idx)		%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> %idx)
%a1 = sitofp <8 x i64> %idx to <8 x double>		%a1 = sitofp <8 x i64> %idx to <8 x double>
%res = fadd <8 x double> %2, %a1		%res = fadd <8 x double> %2, %a1
ret <8 x double> %res		ret <8 x double> %res
}		}

define <8 x double> @permpd_broadcast_512(ptr %p0, <8 x i64> %idx) {		define <8 x double> @permpd_broadcast_512(ptr %p0, <8 x i64> %idx) #0 {
; ENABLE-LABEL: permpd_broadcast_512:		; ENABLE-LABEL: permpd_broadcast_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 18 Lines	; DISABLE-NEXT: retq
%t0 = insertelement <8 x double> undef, double %v0, i64 0		%t0 = insertelement <8 x double> undef, double %v0, i64 0
%a0 = shufflevector <8 x double> %t0, <8 x double> undef, <8 x i32> zeroinitializer		%a0 = shufflevector <8 x double> %t0, <8 x double> undef, <8 x i32> zeroinitializer
%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> %idx)		%2 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> %idx)
%a1 = sitofp <8 x i64> %idx to <8 x double>		%a1 = sitofp <8 x i64> %idx to <8 x double>
%res = fadd <8 x double> %2, %a1		%res = fadd <8 x double> %2, %a1
ret <8 x double> %res		ret <8 x double> %res
}		}

define <8 x double> @permpd_maskz_512(<8 x double> %a0, <8 x i64> %idx, ptr %mask) {		define <8 x double> @permpd_maskz_512(<8 x double> %a0, <8 x i64> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permpd_maskz_512:		; ENABLE-LABEL: permpd_maskz_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermpd %zmm0, %zmm1, %zmm2 {%k1} {z}		; ENABLE-NEXT: vpermpd %zmm0, %zmm1, %zmm2 {%k1} {z}
Show All 21 Lines	; DISABLE-NEXT: retq
%res = fadd <8 x double> %3, %t		%res = fadd <8 x double> %3, %t
ret <8 x double> %res		ret <8 x double> %res
}		}

declare <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double>, <8 x i64>)		declare <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double>, <8 x i64>)
declare <8 x double> @llvm.x86.avx512.mask.permvar.df.512(<8 x double>, <8 x i64>, <8 x double>, i8)		declare <8 x double> @llvm.x86.avx512.mask.permvar.df.512(<8 x double>, <8 x i64>, <8 x double>, i8)


define <8 x float> @permps_rr_256(<8 x float> %a0, <8 x i32> %idx) {		define <8 x float> @permps_rr_256(<8 x float> %a0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permps_rr_256:		; ENABLE-LABEL: permps_rr_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovaps %ymm0, %ymm2		; ENABLE-NEXT: vmovaps %ymm0, %ymm2
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload
Show All 20 Lines	; DISABLE-NEXT: retq
%1 = tail call <8 x float> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <8 x float> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %1, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)		%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %1, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)
%a1 = sitofp <8 x i32> %idx to <8 x float>		%a1 = sitofp <8 x i32> %idx to <8 x float>
%t = fadd <8 x float> %1, %a1		%t = fadd <8 x float> %1, %a1
%res = fadd <8 x float> %2, %t		%res = fadd <8 x float> %2, %t
ret <8 x float> %res		ret <8 x float> %res
}		}

define <8 x float> @permps_rm_256(ptr %p0, <8 x i32> %idx) {		define <8 x float> @permps_rm_256(ptr %p0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permps_rm_256:		; ENABLE-LABEL: permps_rm_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 16 Lines	; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <8 x float>, ptr %p0, align 64		%a0 = load <8 x float>, ptr %p0, align 64
%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %a0, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)		%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %a0, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)
%a1 = sitofp <8 x i32> %idx to <8 x float>		%a1 = sitofp <8 x i32> %idx to <8 x float>
%res = fadd <8 x float> %2, %a1		%res = fadd <8 x float> %2, %a1
ret <8 x float> %res		ret <8 x float> %res
}		}

define <8 x float> @permps_broadcast_256(ptr %p0, <8 x i32> %idx) {		define <8 x float> @permps_broadcast_256(ptr %p0, <8 x i32> %idx) #0 {
; ENABLE-LABEL: permps_broadcast_256:		; ENABLE-LABEL: permps_broadcast_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill		; ENABLE-NEXT: vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vxorps %xmm0, %xmm0, %xmm0
Show All 18 Lines	; DISABLE-NEXT: retq
%t0 = insertelement <8 x float> undef, float %v0, i32 0		%t0 = insertelement <8 x float> undef, float %v0, i32 0
%a0 = shufflevector <8 x float> %t0, <8 x float> undef, <8 x i32> zeroinitializer		%a0 = shufflevector <8 x float> %t0, <8 x float> undef, <8 x i32> zeroinitializer
%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %a0, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)		%2 = call <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float> %a0, <8 x i32> %idx, <8 x float> zeroinitializer, i8 -1)
%a1 = sitofp <8 x i32> %idx to <8 x float>		%a1 = sitofp <8 x i32> %idx to <8 x float>
%res = fadd <8 x float> %2, %a1		%res = fadd <8 x float> %2, %a1
ret <8 x float> %res		ret <8 x float> %res
}		}

define <8 x float> @permps_maskz_256(<8 x float> %a0, <8 x i32> %idx, ptr %mask) {		define <8 x float> @permps_maskz_256(<8 x float> %a0, <8 x i32> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permps_maskz_256:		; ENABLE-LABEL: permps_maskz_256:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: kmovb (%rdi), %k1		; ENABLE-NEXT: kmovb (%rdi), %k1
; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vxorps %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermps %ymm0, %ymm1, %ymm2 {%k1} {z}		; ENABLE-NEXT: vpermps %ymm0, %ymm1, %ymm2 {%k1} {z}
Show All 19 Lines	; DISABLE-NEXT: retq
%a1 = sitofp <8 x i32> %idx to <8 x float>		%a1 = sitofp <8 x i32> %idx to <8 x float>
%t = fadd <8 x float> %a0, %a1		%t = fadd <8 x float> %a0, %a1
%res = fadd <8 x float> %3, %t		%res = fadd <8 x float> %3, %t
ret <8 x float> %res		ret <8 x float> %res
}		}

declare <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float>, <8 x i32>, <8 x float>, i8)		declare <8 x float> @llvm.x86.avx512.mask.permvar.sf.256(<8 x float>, <8 x i32>, <8 x float>, i8)

define <16 x float> @permps_rr_512(<16 x float> %a0, <16 x i32> %idx) {		define <16 x float> @permps_rr_512(<16 x float> %a0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permps_rr_512:		; ENABLE-LABEL: permps_rr_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovaps %zmm0, %zmm2		; ENABLE-NEXT: vmovaps %zmm0, %zmm2
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
Show All 20 Lines	; DISABLE-NEXT: retq
%1 = tail call <16 x float> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <16 x float> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %1, <16 x i32> %idx)		%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %1, <16 x i32> %idx)
%a1 = sitofp <16 x i32> %idx to <16 x float>		%a1 = sitofp <16 x i32> %idx to <16 x float>
%t = fadd <16 x float> %1, %a1		%t = fadd <16 x float> %1, %a1
%res = fadd <16 x float> %2, %t		%res = fadd <16 x float> %2, %t
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @permps_rm_512(ptr %p0, <16 x i32> %idx) {		define <16 x float> @permps_rm_512(ptr %p0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permps_rm_512:		; ENABLE-LABEL: permps_rm_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 16 Lines	; DISABLE-NEXT: retq
%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()		%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
%a0 = load <16 x float>, ptr %p0, align 64		%a0 = load <16 x float>, ptr %p0, align 64
%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> %idx)		%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> %idx)
%a1 = sitofp <16 x i32> %idx to <16 x float>		%a1 = sitofp <16 x i32> %idx to <16 x float>
%res = fadd <16 x float> %2, %a1		%res = fadd <16 x float> %2, %a1
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @permps_broadcast_512(ptr %p0, <16 x i32> %idx) {		define <16 x float> @permps_broadcast_512(ptr %p0, <16 x i32> %idx) #0 {
; ENABLE-LABEL: permps_broadcast_512:		; ENABLE-LABEL: permps_broadcast_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill		; ENABLE-NEXT: vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload		; ENABLE-NEXT: vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0		; ENABLE-NEXT: vpxor %xmm0, %xmm0, %xmm0
Show All 18 Lines	; DISABLE-NEXT: retq
%t0 = insertelement <16 x float> undef, float %v0, i32 0		%t0 = insertelement <16 x float> undef, float %v0, i32 0
%a0 = shufflevector <16 x float> %t0, <16 x float> undef, <16 x i32> zeroinitializer		%a0 = shufflevector <16 x float> %t0, <16 x float> undef, <16 x i32> zeroinitializer
%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> %idx)		%2 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> %idx)
%a1 = sitofp <16 x i32> %idx to <16 x float>		%a1 = sitofp <16 x i32> %idx to <16 x float>
%res = fadd <16 x float> %2, %a1		%res = fadd <16 x float> %2, %a1
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @permps_maskz_512(<16 x float> %a0, <16 x i32> %idx, ptr %mask) {		define <16 x float> @permps_maskz_512(<16 x float> %a0, <16 x i32> %idx, ptr %mask) #0 {
; ENABLE-LABEL: permps_maskz_512:		; ENABLE-LABEL: permps_maskz_512:
; ENABLE: # %bb.0:		; ENABLE: # %bb.0:
; ENABLE-NEXT: #APP		; ENABLE-NEXT: #APP
; ENABLE-NEXT: nop		; ENABLE-NEXT: nop
; ENABLE-NEXT: #NO_APP		; ENABLE-NEXT: #NO_APP
; ENABLE-NEXT: kmovw (%rdi), %k1		; ENABLE-NEXT: kmovw (%rdi), %k1
; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2		; ENABLE-NEXT: vpxor %xmm2, %xmm2, %xmm2
; ENABLE-NEXT: vpermps %zmm0, %zmm1, %zmm2 {%k1} {z}		; ENABLE-NEXT: vpermps %zmm0, %zmm1, %zmm2 {%k1} {z}
Show All 19 Lines	; DISABLE-NEXT: retq
%a1 = sitofp <16 x i32> %idx to <16 x float>		%a1 = sitofp <16 x i32> %idx to <16 x float>
%t = fadd <16 x float> %a0, %a1		%t = fadd <16 x float> %a0, %a1
%res = fadd <16 x float> %3, %t		%res = fadd <16 x float> %3, %t
ret <16 x float> %res		ret <16 x float> %res
}		}

declare <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float>, <16 x i32>)		declare <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float>, <16 x i32>)
declare <16 x float> @llvm.x86.avx512.mask.permvar.sf.512(<16 x float>, <16 x i32>, <16 x float>, i16)		declare <16 x float> @llvm.x86.avx512.mask.permvar.sf.512(<16 x float>, <16 x i32>, <16 x float>, i16)

		attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/pr47299.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O3 -x86-asm-syntax=intel -mtriple=x86_64 -mcpu=skylake-avx512 < %s \| FileCheck %s			; RUN: llc -O3 -x86-asm-syntax=intel -mtriple=x86_64 -mcpu=skylake-avx512 < %s \| FileCheck %s

	declare <7 x i1> @llvm.get.active.lane.mask.v7i1.i64(i64, i64)			declare <7 x i1> @llvm.get.active.lane.mask.v7i1.i64(i64, i64)
	declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64, i64)			declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64, i64)
	declare <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64, i64)			declare <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64, i64)
	declare <64 x i1> @llvm.get.active.lane.mask.v64i1.i64(i64, i64)			declare <64 x i1> @llvm.get.active.lane.mask.v64i1.i64(i64, i64)
	declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32, i32)			declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32, i32)
	declare <64 x i1> @llvm.get.active.lane.mask.v64i1.i32(i32, i32)			declare <64 x i1> @llvm.get.active.lane.mask.v64i1.i32(i32, i32)

	define <7 x i1> @create_mask7(i64 %0) {			define <7 x i1> @create_mask7(i64 %0) #0 {
	; CHECK-LABEL: create_mask7:			; CHECK-LABEL: create_mask7:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: mov rax, rdi			; CHECK-NEXT: mov rax, rdi
	; CHECK-NEXT: vpbroadcastq zmm0, rsi			; CHECK-NEXT: vpbroadcastq zmm0, rsi
	; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kshiftrb k1, k0, 6			; CHECK-NEXT: kshiftrb k1, k0, 6
	; CHECK-NEXT: kmovd ecx, k1			; CHECK-NEXT: kmovd ecx, k1
	; CHECK-NEXT: kshiftrb k1, k0, 5			; CHECK-NEXT: kshiftrb k1, k0, 5
	Show All 28 Lines
	; CHECK-NEXT: and cl, 127			; CHECK-NEXT: and cl, 127
	; CHECK-NEXT: mov byte ptr [rax], cl			; CHECK-NEXT: mov byte ptr [rax], cl
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <7 x i1> @llvm.get.active.lane.mask.v7i1.i64(i64 0, i64 %0)			%2 = call <7 x i1> @llvm.get.active.lane.mask.v7i1.i64(i64 0, i64 %0)
	ret <7 x i1> %2			ret <7 x i1> %2
	}			}

	define <16 x i1> @create_mask16(i64 %0) {			define <16 x i1> @create_mask16(i64 %0) #0 {
	; CHECK-LABEL: create_mask16:			; CHECK-LABEL: create_mask16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpbroadcastq zmm0, rdi			; CHECK-NEXT: vpbroadcastq zmm0, rdi
	; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k0, k1, k0			; CHECK-NEXT: kunpckbw k0, k1, k0
	; CHECK-NEXT: vpmovm2b xmm0, k0			; CHECK-NEXT: vpmovm2b xmm0, k0
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 0, i64 %0)			%2 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 0, i64 %0)
	ret <16 x i1> %2			ret <16 x i1> %2
	}			}

	define <32 x i1> @create_mask32(i64 %0) {			define <32 x i1> @create_mask32(i64 %0) #0 {
	; CHECK-LABEL: create_mask32:			; CHECK-LABEL: create_mask32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpbroadcastq zmm0, rdi			; CHECK-NEXT: vpbroadcastq zmm0, rdi
	; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k0, k1, k0			; CHECK-NEXT: kunpckbw k0, k1, k0
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k1, k1, k2			; CHECK-NEXT: kunpckbw k1, k1, k2
	; CHECK-NEXT: kunpckwd k0, k1, k0			; CHECK-NEXT: kunpckwd k0, k1, k0
	; CHECK-NEXT: vpmovm2b ymm0, k0			; CHECK-NEXT: vpmovm2b ymm0, k0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 0, i64 %0)			%2 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 0, i64 %0)
	ret <32 x i1> %2			ret <32 x i1> %2
	}			}

	define <64 x i1> @create_mask64(i64 %0) {			define <64 x i1> @create_mask64(i64 %0) #0 {
	; CHECK-LABEL: create_mask64:			; CHECK-LABEL: create_mask64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpbroadcastq zmm0, rdi			; CHECK-NEXT: vpbroadcastq zmm0, rdi
	; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k0, k1, k0			; CHECK-NEXT: kunpckbw k0, k1, k0
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k1, k1, k2			; CHECK-NEXT: kunpckbw k1, k1, k2
	; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckwd k0, k1, k0			; CHECK-NEXT: kunpckwd k0, k1, k0
	; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k1, k1, k2			; CHECK-NEXT: kunpckbw k1, k1, k2
	; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleuq k3, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleuq k3, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckbw k2, k3, k2			; CHECK-NEXT: kunpckbw k2, k3, k2
	; CHECK-NEXT: kunpckwd k1, k2, k1			; CHECK-NEXT: kunpckwd k1, k2, k1
	; CHECK-NEXT: kunpckdq k0, k1, k0			; CHECK-NEXT: kunpckdq k0, k1, k0
	; CHECK-NEXT: vpmovm2b zmm0, k0			; CHECK-NEXT: vpmovm2b zmm0, k0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <64 x i1> @llvm.get.active.lane.mask.v64i1.i64(i64 0, i64 %0)			%2 = call <64 x i1> @llvm.get.active.lane.mask.v64i1.i64(i64 0, i64 %0)
	ret <64 x i1> %2			ret <64 x i1> %2
	}			}

	define <16 x i1> @create_mask16_i32(i32 %0) {			define <16 x i1> @create_mask16_i32(i32 %0) #0 {
	; CHECK-LABEL: create_mask16_i32:			; CHECK-LABEL: create_mask16_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpbroadcastd zmm0, edi			; CHECK-NEXT: vpbroadcastd zmm0, edi
	; CHECK-NEXT: vpcmpnleud k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleud k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpmovm2b xmm0, k0			; CHECK-NEXT: vpmovm2b xmm0, k0
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 0, i32 %0)			%2 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 0, i32 %0)
	ret <16 x i1> %2			ret <16 x i1> %2
	}			}

	define <64 x i1> @create_mask64_i32(i32 %0) {			define <64 x i1> @create_mask64_i32(i32 %0) #0 {
	; CHECK-LABEL: create_mask64_i32:			; CHECK-LABEL: create_mask64_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpbroadcastd zmm0, edi			; CHECK-NEXT: vpbroadcastd zmm0, edi
	; CHECK-NEXT: vpcmpnleud k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleud k0, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleud k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleud k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: vpcmpnleud k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleud k2, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckwd k0, k1, k0			; CHECK-NEXT: kunpckwd k0, k1, k0
	; CHECK-NEXT: vpcmpnleud k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]			; CHECK-NEXT: vpcmpnleud k1, zmm0, zmmword ptr [rip + {{\.?LCPI[0-9]+_[0-9]+}}]
	; CHECK-NEXT: kunpckwd k1, k1, k2			; CHECK-NEXT: kunpckwd k1, k1, k2
	; CHECK-NEXT: kunpckdq k0, k1, k0			; CHECK-NEXT: kunpckdq k0, k1, k0
	; CHECK-NEXT: vpmovm2b zmm0, k0			; CHECK-NEXT: vpmovm2b zmm0, k0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = call <64 x i1> @llvm.get.active.lane.mask.v64i1.i32(i32 0, i32 %0)			%2 = call <64 x i1> @llvm.get.active.lane.mask.v64i1.i32(i32 0, i32 %0)
	ret <64 x i1> %2			ret <64 x i1> %2
	}			}

				attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/pr48727.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-- -mcpu=skx \| FileCheck %s		; RUN: llc < %s -mtriple=x86_64-- -mcpu=skx \| FileCheck %s

define void @PR48727() {		define void @PR48727() #0 {
; CHECK-LABEL: PR48727:		; CHECK-LABEL: PR48727:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vcvttpd2dqy 0, %xmm0		; CHECK-NEXT: vcvttpd2dqy 0, %xmm0
; CHECK-NEXT: vcvttpd2dqy 128, %xmm1		; CHECK-NEXT: vcvttpd2dqy 128, %xmm1
; CHECK-NEXT: movq (%rax), %rax		; CHECK-NEXT: movq (%rax), %rax
; CHECK-NEXT: vcvttpd2dqy 160, %xmm2		; CHECK-NEXT: vcvttpd2dqy 160, %xmm2
; CHECK-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1		; CHECK-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1
; CHECK-NEXT: vcvttpd2dqy (%rax), %xmm2		; CHECK-NEXT: vcvttpd2dqy (%rax), %xmm2
Show All 25 Lines	entry:
store <4 x i16> %9, ptr %10, align 8		store <4 x i16> %9, ptr %10, align 8
ret void		ret void
}		}

!0 = !{}		!0 = !{}
!1 = !{!2}		!1 = !{!2}
!2 = !{!"buffer: {index:1, offset:0, size:20000}", !3}		!2 = !{!"buffer: {index:1, offset:0, size:20000}", !3}
!3 = !{!"XLA global AA domain"}		!3 = !{!"XLA global AA domain"}

		attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/vector-shuffle-avx512.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-pc-linux-gnu -mcpu=skx \| FileCheck %s --check-prefixes=CHECK,SKX,X64,SKX64		; RUN: llc < %s -mtriple=x86_64-pc-linux-gnu -mcpu=skx \| FileCheck %s --check-prefixes=CHECK,SKX,X64,SKX64
; RUN: llc < %s -mtriple=x86_64-pc-linux-gnu -mcpu=knl \| FileCheck %s --check-prefixes=CHECK,KNL,X64,KNL64		; RUN: llc < %s -mtriple=x86_64-pc-linux-gnu -mcpu=knl \| FileCheck %s --check-prefixes=CHECK,KNL,X64,KNL64
; RUN: llc < %s -mtriple=i386-pc-linux-gnu -mcpu=skx \| FileCheck %s --check-prefixes=CHECK,SKX,X86,SKX32		; RUN: llc < %s -mtriple=i386-pc-linux-gnu -mcpu=skx \| FileCheck %s --check-prefixes=CHECK,SKX,X86,SKX32
; RUN: llc < %s -mtriple=i386-pc-linux-gnu -mcpu=knl \| FileCheck %s --check-prefixes=CHECK,KNL,X86,KNL32		; RUN: llc < %s -mtriple=i386-pc-linux-gnu -mcpu=knl \| FileCheck %s --check-prefixes=CHECK,KNL,X86,KNL32

;expand 128 -> 256 include <4 x float> <2 x double>		;expand 128 -> 256 include <4 x float> <2 x double>
define <8 x float> @expand(<4 x float> %a) {		define <8 x float> @expand(<4 x float> %a) #0 {
; SKX-LABEL: expand:		; SKX-LABEL: expand:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: movb $5, %al		; SKX-NEXT: movb $5, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}		; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand:		; KNL-LABEL: expand:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]		; KNL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1		; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[3,4,5,6,7]		; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[3,4,5,6,7]
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x float> %a, <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 5, i32 1, i32 5, i32 5, i32 5, i32 5, i32 5>		%res = shufflevector <4 x float> %a, <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 5, i32 1, i32 5, i32 5, i32 5, i32 5, i32 5>
ret <8 x float> %res		ret <8 x float> %res
}		}

define <8 x float> @expand1(<4 x float> %a ) {		define <8 x float> @expand1(<4 x float> %a ) #0 {
; SKX-LABEL: expand1:		; SKX-LABEL: expand1:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: movb $-86, %al		; SKX-NEXT: movb $-86, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}		; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand1:		; KNL-LABEL: expand1:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; KNL-NEXT: vmovaps {{.*#+}} ymm1 = [16,0,18,1,20,2,22,3]		; KNL-NEXT: vmovaps {{.*#+}} ymm1 = [16,0,18,1,20,2,22,3]
; KNL-NEXT: vxorps %xmm2, %xmm2, %xmm2		; KNL-NEXT: vxorps %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpermt2ps %zmm2, %zmm1, %zmm0		; KNL-NEXT: vpermt2ps %zmm2, %zmm1, %zmm0
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>		%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
ret <8 x float> %res		ret <8 x float> %res
}		}

;Expand 128 -> 256 test <2 x double> -> <4 x double>		;Expand 128 -> 256 test <2 x double> -> <4 x double>
define <4 x double> @expand2(<2 x double> %a) {		define <4 x double> @expand2(<2 x double> %a) #0 {
; CHECK-LABEL: expand2:		; CHECK-LABEL: expand2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; CHECK-NEXT: vperm2f128 {{.*#+}} ymm1 = zero,zero,ymm0[0,1]		; CHECK-NEXT: vperm2f128 {{.*#+}} ymm1 = zero,zero,ymm0[0,1]
; CHECK-NEXT: vmovaps %xmm0, %xmm0		; CHECK-NEXT: vmovaps %xmm0, %xmm0
; CHECK-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3],ymm0[4,5],ymm1[6,7]		; CHECK-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3],ymm0[4,5],ymm1[6,7]
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <4 x i32> <i32 0, i32 2, i32 2, i32 1>		%res = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <4 x i32> <i32 0, i32 2, i32 2, i32 1>
ret <4 x double> %res		ret <4 x double> %res
}		}

;expand 128 -> 256 include case <4 x i32> <8 x i32>		;expand 128 -> 256 include case <4 x i32> <8 x i32>
define <8 x i32> @expand3(<4 x i32> %a ) {		define <8 x i32> @expand3(<4 x i32> %a ) #0 {
; SKX-LABEL: expand3:		; SKX-LABEL: expand3:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: movb $-127, %al		; SKX-NEXT: movb $-127, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vpexpandd %ymm0, %ymm0 {%k1} {z}		; SKX-NEXT: vpexpandd %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand3:		; KNL-LABEL: expand3:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vbroadcastsd %xmm0, %ymm0		; KNL-NEXT: vbroadcastsd %xmm0, %ymm0
; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1		; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]		; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <8 x i32> <i32 4, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,i32 5>		%res = shufflevector <4 x i32> zeroinitializer, <4 x i32> %a, <8 x i32> <i32 4, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,i32 5>
ret <8 x i32> %res		ret <8 x i32> %res
}		}

;expand 128 -> 256 include case <2 x i64> <4 x i64>		;expand 128 -> 256 include case <2 x i64> <4 x i64>
define <4 x i64> @expand4(<2 x i64> %a ) {		define <4 x i64> @expand4(<2 x i64> %a ) #0 {
; SKX-LABEL: expand4:		; SKX-LABEL: expand4:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: movb $9, %al		; SKX-NEXT: movb $9, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vpexpandq %ymm0, %ymm0 {%k1} {z}		; SKX-NEXT: vpexpandq %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand4:		; KNL-LABEL: expand4:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; KNL-NEXT: vperm2f128 {{.*#+}} ymm1 = zero,zero,ymm0[0,1]		; KNL-NEXT: vperm2f128 {{.*#+}} ymm1 = zero,zero,ymm0[0,1]
; KNL-NEXT: vmovaps %xmm0, %xmm0		; KNL-NEXT: vmovaps %xmm0, %xmm0
; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3],ymm0[4,5],ymm1[6,7]		; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3],ymm0[4,5],ymm1[6,7]
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <2 x i64> zeroinitializer, <2 x i64> %a, <4 x i32> <i32 2, i32 0, i32 0, i32 3>		%res = shufflevector <2 x i64> zeroinitializer, <2 x i64> %a, <4 x i32> <i32 2, i32 0, i32 0, i32 3>
ret <4 x i64> %res		ret <4 x i64> %res
}		}

;Negative test for 128-> 256		;Negative test for 128-> 256
define <8 x float> @expand5(<4 x float> %a ) {		define <8 x float> @expand5(<4 x float> %a ) #0 {
; SKX-LABEL: expand5:		; SKX-LABEL: expand5:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; SKX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; SKX-NEXT: vmovaps {{.*#+}} ymm2 = [8,0,10,0,12,0,14,0]		; SKX-NEXT: vmovaps {{.*#+}} ymm2 = [8,0,10,0,12,0,14,0]
; SKX-NEXT: vpermt2ps %ymm1, %ymm2, %ymm0		; SKX-NEXT: vpermt2ps %ymm1, %ymm2, %ymm0
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand5:		; KNL-LABEL: expand5:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vbroadcastss %xmm0, %ymm0		; KNL-NEXT: vbroadcastss %xmm0, %ymm0
; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1		; KNL-NEXT: vxorps %xmm1, %xmm1, %xmm1
; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7]		; KNL-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1],ymm1[2],ymm0[3],ymm1[4],ymm0[5],ymm1[6],ymm0[7]
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 4, i32 1, i32 4, i32 2, i32 4, i32 3, i32 4>		%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 4, i32 1, i32 4, i32 2, i32 4, i32 3, i32 4>
ret <8 x float> %res		ret <8 x float> %res
}		}

;expand 256 -> 512 include <8 x float> <16 x float>		;expand 256 -> 512 include <8 x float> <16 x float>
define <8 x float> @expand6(<4 x float> %a ) {		define <8 x float> @expand6(<4 x float> %a ) #0 {
; CHECK-LABEL: expand6:		; CHECK-LABEL: expand6:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1		; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0		; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%res = shufflevector <4 x float> zeroinitializer, <4 x float> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x float> %res		ret <8 x float> %res
}		}

define <16 x float> @expand7(<8 x float> %a) {		define <16 x float> @expand7(<8 x float> %a) #0 {
; SKX-LABEL: expand7:		; SKX-LABEL: expand7:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; SKX-NEXT: movw $1285, %ax # imm = 0x505		; SKX-NEXT: movw $1285, %ax # imm = 0x505
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}		; SKX-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand7:		; KNL-LABEL: expand7:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-NEXT: movw $1285, %ax # imm = 0x505		; KNL-NEXT: movw $1285, %ax # imm = 0x505
; KNL-NEXT: kmovw %eax, %k1		; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <8 x float> %a, <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 8, i32 1, i32 8, i32 8, i32 8, i32 8, i32 8, i32 2, i32 8, i32 3, i32 8, i32 8, i32 8, i32 8, i32 8>		%res = shufflevector <8 x float> %a, <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 8, i32 1, i32 8, i32 8, i32 8, i32 8, i32 8, i32 2, i32 8, i32 3, i32 8, i32 8, i32 8, i32 8, i32 8>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @expand8(<8 x float> %a ) {		define <16 x float> @expand8(<8 x float> %a ) #0 {
; SKX-LABEL: expand8:		; SKX-LABEL: expand8:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; SKX-NEXT: movw $-21846, %ax # imm = 0xAAAA		; SKX-NEXT: movw $-21846, %ax # imm = 0xAAAA
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}		; SKX-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand8:		; KNL-LABEL: expand8:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-NEXT: movw $-21846, %ax # imm = 0xAAAA		; KNL-NEXT: movw $-21846, %ax # imm = 0xAAAA
; KNL-NEXT: kmovw %eax, %k1		; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vexpandps %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}

;expand 256 -> 512 include <4 x double> <8 x double>		;expand 256 -> 512 include <4 x double> <8 x double>
define <8 x double> @expand9(<4 x double> %a) {		define <8 x double> @expand9(<4 x double> %a) #0 {
; SKX-LABEL: expand9:		; SKX-LABEL: expand9:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; SKX-NEXT: movb $-127, %al		; SKX-NEXT: movb $-127, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandpd %zmm0, %zmm0 {%k1} {z}		; SKX-NEXT: vexpandpd %zmm0, %zmm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand9:		; KNL-LABEL: expand9:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-NEXT: movb $-127, %al		; KNL-NEXT: movb $-127, %al
; KNL-NEXT: kmovw %eax, %k1		; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: vexpandpd %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vexpandpd %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x double> %a, <4 x double> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 1>		%res = shufflevector <4 x double> %a, <4 x double> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 1>
ret <8 x double> %res		ret <8 x double> %res
}		}

define <16 x i32> @expand10(<8 x i32> %a ) {		define <16 x i32> @expand10(<8 x i32> %a ) #0 {
; SKX-LABEL: expand10:		; SKX-LABEL: expand10:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; SKX-NEXT: movw $-21846, %ax # imm = 0xAAAA		; SKX-NEXT: movw $-21846, %ax # imm = 0xAAAA
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vpexpandd %zmm0, %zmm0 {%k1} {z}		; SKX-NEXT: vpexpandd %zmm0, %zmm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand10:		; KNL-LABEL: expand10:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-NEXT: movw $-21846, %ax # imm = 0xAAAA		; KNL-NEXT: movw $-21846, %ax # imm = 0xAAAA
; KNL-NEXT: kmovw %eax, %k1		; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: vpexpandd %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vpexpandd %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <8 x i32> zeroinitializer, <8 x i32> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%res = shufflevector <8 x i32> zeroinitializer, <8 x i32> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
ret <16 x i32> %res		ret <16 x i32> %res
}		}

define <8 x i64> @expand11(<4 x i64> %a) {		define <8 x i64> @expand11(<4 x i64> %a) #0 {
; SKX-LABEL: expand11:		; SKX-LABEL: expand11:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; SKX-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; SKX-NEXT: movb $-127, %al		; SKX-NEXT: movb $-127, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vpexpandq %zmm0, %zmm0 {%k1} {z}		; SKX-NEXT: vpexpandq %zmm0, %zmm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand11:		; KNL-LABEL: expand11:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; KNL-NEXT: movb $-127, %al		; KNL-NEXT: movb $-127, %al
; KNL-NEXT: kmovw %eax, %k1		; KNL-NEXT: kmovw %eax, %k1
; KNL-NEXT: vpexpandq %zmm0, %zmm0 {%k1} {z}		; KNL-NEXT: vpexpandq %zmm0, %zmm0 {%k1} {z}
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x i64> %a, <4 x i64> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 1>		%res = shufflevector <4 x i64> %a, <4 x i64> zeroinitializer, <8 x i32> <i32 0, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 1>
ret <8 x i64> %res		ret <8 x i64> %res
}		}

;Negative test for 256-> 512		;Negative test for 256-> 512
define <16 x float> @expand12(<8 x float> %a) {		define <16 x float> @expand12(<8 x float> %a) #0 {
; CHECK-LABEL: expand12:		; CHECK-LABEL: expand12:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vmovaps {{.*#+}} zmm2 = [0,16,2,16,4,16,6,16,0,16,1,16,2,16,3,16]		; CHECK-NEXT: vmovaps {{.*#+}} zmm2 = [0,16,2,16,4,16,6,16,0,16,1,16,2,16,3,16]
; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1		; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1
; CHECK-NEXT: vpermt2ps %zmm0, %zmm2, %zmm1		; CHECK-NEXT: vpermt2ps %zmm0, %zmm2, %zmm1
; CHECK-NEXT: vmovaps %zmm1, %zmm0		; CHECK-NEXT: vmovaps %zmm1, %zmm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 8, i32 2, i32 8, i32 3, i32 8,i32 0, i32 8, i32 1, i32 8, i32 2, i32 8, i32 3, i32 8>		%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 8, i32 1, i32 8, i32 2, i32 8, i32 3, i32 8,i32 0, i32 8, i32 1, i32 8, i32 2, i32 8, i32 3, i32 8>
ret <16 x float> %res		ret <16 x float> %res
}		}

define <16 x float> @expand13(<8 x float> %a ) {		define <16 x float> @expand13(<8 x float> %a ) #0 {
; CHECK-LABEL: expand13:		; CHECK-LABEL: expand13:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1		; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1
; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm1, %zmm0		; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm1, %zmm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <8 x float> zeroinitializer, <8 x float> %a, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}

; The function checks for a case where the vector is mixed values vector ,and the mask points on zero elements from this vector.		; The function checks for a case where the vector is mixed values vector ,and the mask points on zero elements from this vector.

define <8 x float> @expand14(<4 x float> %a) {		define <8 x float> @expand14(<4 x float> %a) #0 {
; SKX-LABEL: expand14:		; SKX-LABEL: expand14:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: movb $20, %al		; SKX-NEXT: movb $20, %al
; SKX-NEXT: kmovd %eax, %k1		; SKX-NEXT: kmovd %eax, %k1
; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}		; SKX-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
; KNL-LABEL: expand14:		; KNL-LABEL: expand14:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; KNL-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; KNL-NEXT: vmovaps {{.*#+}} ymm1 = [16,17,0,19,1,21,22,23]		; KNL-NEXT: vmovaps {{.*#+}} ymm1 = [16,17,0,19,1,21,22,23]
; KNL-NEXT: vxorps %xmm2, %xmm2, %xmm2		; KNL-NEXT: vxorps %xmm2, %xmm2, %xmm2
; KNL-NEXT: vpermt2ps %zmm2, %zmm1, %zmm0		; KNL-NEXT: vpermt2ps %zmm2, %zmm1, %zmm0
; KNL-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0		; KNL-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>		%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>
%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 3, i32 3, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>		%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 3, i32 3, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>
ret <8 x float> %res		ret <8 x float> %res
}		}

;Negative test.		;Negative test.
define <8 x float> @expand15(<4 x float> %a) {		define <8 x float> @expand15(<4 x float> %a) #0 {
; SKX-LABEL: expand15:		; SKX-LABEL: expand15:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; SKX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; SKX-NEXT: vmovaps {{.*#+}} ymm1 = <u,u,0,u,1,u,u,u>		; SKX-NEXT: vmovaps {{.*#+}} ymm1 = <u,u,0,u,1,u,u,u>
; SKX-NEXT: vpermps %ymm0, %ymm1, %ymm0		; SKX-NEXT: vpermps %ymm0, %ymm1, %ymm0
; SKX-NEXT: vblendps {{.*#+}} ymm0 = mem[0,1],ymm0[2],mem[3],ymm0[4],mem[5,6,7]		; SKX-NEXT: vblendps {{.*#+}} ymm0 = mem[0,1],ymm0[2],mem[3],ymm0[4],mem[5,6,7]
; SKX-NEXT: ret{{[l\|q]}}		; SKX-NEXT: ret{{[l\|q]}}
;		;
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
; KNL-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0		; KNL-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0
; KNL-NEXT: ret{{[l\|q]}}		; KNL-NEXT: ret{{[l\|q]}}
entry:		entry:
%0 = shufflevector <16 x i8> %A, <16 x i8> %W, <16 x i32> <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>		%0 = shufflevector <16 x i8> %A, <16 x i8> %W, <16 x i32> <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>
ret <16 x i8> %0		ret <16 x i8> %0
}		}

; PR34370		; PR34370
define <8 x float> @test_masked_permps_v8f32(ptr %vp, <8 x float> %vec2) {		define <8 x float> @test_masked_permps_v8f32(ptr %vp, <8 x float> %vec2) #0 {
; SKX64-LABEL: test_masked_permps_v8f32:		; SKX64-LABEL: test_masked_permps_v8f32:
; SKX64: # %bb.0:		; SKX64: # %bb.0:
; SKX64-NEXT: vmovaps (%rdi), %ymm2		; SKX64-NEXT: vmovaps (%rdi), %ymm2
; SKX64-NEXT: vmovaps {{.*#+}} ymm1 = [7,6,3,11,7,6,14,15]		; SKX64-NEXT: vmovaps {{.*#+}} ymm1 = [7,6,3,11,7,6,14,15]
; SKX64-NEXT: vpermi2ps %ymm0, %ymm2, %ymm1		; SKX64-NEXT: vpermi2ps %ymm0, %ymm2, %ymm1
; SKX64-NEXT: vmovaps %ymm1, %ymm0		; SKX64-NEXT: vmovaps %ymm1, %ymm0
; SKX64-NEXT: retq		; SKX64-NEXT: retq
;		;
Show All 25 Lines
; KNL32-NEXT: vmovaps %ymm1, %ymm0		; KNL32-NEXT: vmovaps %ymm1, %ymm0
; KNL32-NEXT: retl		; KNL32-NEXT: retl
%vec = load <8 x float>, ptr %vp		%vec = load <8 x float>, ptr %vp
%shuf = shufflevector <8 x float> %vec, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 3, i32 0, i32 7, i32 6, i32 3, i32 0>		%shuf = shufflevector <8 x float> %vec, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 3, i32 0, i32 7, i32 6, i32 3, i32 0>
%res = select <8 x i1> <i1 1, i1 1, i1 1, i1 0, i1 1, i1 1, i1 0, i1 0>, <8 x float> %shuf, <8 x float> %vec2		%res = select <8 x i1> <i1 1, i1 1, i1 1, i1 0, i1 1, i1 1, i1 0, i1 0>, <8 x float> %shuf, <8 x float> %vec2
ret <8 x float> %res		ret <8 x float> %res
}		}

define <16 x float> @test_masked_permps_v16f32(ptr %vp, <16 x float> %vec2) {		define <16 x float> @test_masked_permps_v16f32(ptr %vp, <16 x float> %vec2) #0 {
; X64-LABEL: test_masked_permps_v16f32:		; X64-LABEL: test_masked_permps_v16f32:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovaps (%rdi), %zmm2		; X64-NEXT: vmovaps (%rdi), %zmm2
; X64-NEXT: vmovaps {{.*#+}} zmm1 = [15,13,11,19,14,12,22,23,7,6,3,27,7,29,3,31]		; X64-NEXT: vmovaps {{.*#+}} zmm1 = [15,13,11,19,14,12,22,23,7,6,3,27,7,29,3,31]
; X64-NEXT: vpermi2ps %zmm0, %zmm2, %zmm1		; X64-NEXT: vpermi2ps %zmm0, %zmm2, %zmm1
; X64-NEXT: vmovaps %zmm1, %zmm0		; X64-NEXT: vmovaps %zmm1, %zmm0
; X64-NEXT: retq		; X64-NEXT: retq
;		;
; X86-LABEL: test_masked_permps_v16f32:		; X86-LABEL: test_masked_permps_v16f32:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: vmovaps (%eax), %zmm2		; X86-NEXT: vmovaps (%eax), %zmm2
; X86-NEXT: vmovaps {{.*#+}} zmm1 = [15,13,11,19,14,12,22,23,7,6,3,27,7,29,3,31]		; X86-NEXT: vmovaps {{.*#+}} zmm1 = [15,13,11,19,14,12,22,23,7,6,3,27,7,29,3,31]
; X86-NEXT: vpermi2ps %zmm0, %zmm2, %zmm1		; X86-NEXT: vpermi2ps %zmm0, %zmm2, %zmm1
; X86-NEXT: vmovaps %zmm1, %zmm0		; X86-NEXT: vmovaps %zmm1, %zmm0
; X86-NEXT: retl		; X86-NEXT: retl
%vec = load <16 x float>, ptr %vp		%vec = load <16 x float>, ptr %vp
%shuf = shufflevector <16 x float> %vec, <16 x float> undef, <16 x i32> <i32 15, i32 13, i32 11, i32 9, i32 14, i32 12, i32 10, i32 8, i32 7, i32 6, i32 3, i32 0, i32 7, i32 6, i32 3, i32 0>		%shuf = shufflevector <16 x float> %vec, <16 x float> undef, <16 x i32> <i32 15, i32 13, i32 11, i32 9, i32 14, i32 12, i32 10, i32 8, i32 7, i32 6, i32 3, i32 0, i32 7, i32 6, i32 3, i32 0>
%res = select <16 x i1> <i1 1, i1 1, i1 1, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 1, i1 1, i1 0, i1 1, i1 0, i1 1, i1 0>, <16 x float> %shuf, <16 x float> %vec2		%res = select <16 x i1> <i1 1, i1 1, i1 1, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 1, i1 1, i1 0, i1 1, i1 0, i1 1, i1 0>, <16 x float> %shuf, <16 x float> %vec2
ret <16 x float> %res		ret <16 x float> %res
}		}

define void @test_demandedelts_pshufb_v32i8_v16i8(ptr %src, ptr %dst) {		define void @test_demandedelts_pshufb_v32i8_v16i8(ptr %src, ptr %dst) #0 {
; SKX64-LABEL: test_demandedelts_pshufb_v32i8_v16i8:		; SKX64-LABEL: test_demandedelts_pshufb_v32i8_v16i8:
; SKX64: # %bb.0:		; SKX64: # %bb.0:
; SKX64-NEXT: vpbroadcastd 44(%rdi), %xmm0		; SKX64-NEXT: vpbroadcastd 44(%rdi), %xmm0
; SKX64-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; SKX64-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; SKX64-NEXT: vmovdqa %ymm0, 672(%rsi)		; SKX64-NEXT: vmovdqa %ymm0, 672(%rsi)
; SKX64-NEXT: vmovdqa 208(%rdi), %xmm0		; SKX64-NEXT: vmovdqa 208(%rdi), %xmm0
; SKX64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero		; SKX64-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,6,7,0,1,2,3],zero,zero,zero,zero,zero,zero,zero,zero
; SKX64-NEXT: vmovdqa %ymm0, 832(%rsi)		; SKX64-NEXT: vmovdqa %ymm0, 832(%rsi)
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; KNL32-NEXT: retl
%t11 = insertelement <8 x i32> <i32 undef, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>, i32 %t10, i64 0		%t11 = insertelement <8 x i32> <i32 undef, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>, i32 %t10, i64 0
%t12 = extractelement <16 x i32> %t09, i64 4		%t12 = extractelement <16 x i32> %t09, i64 4
%t13 = insertelement <8 x i32> %t11, i32 %t12, i64 1		%t13 = insertelement <8 x i32> %t11, i32 %t12, i64 1
%ptridx64.i = getelementptr inbounds <8 x i32>, ptr %dst, i64 26		%ptridx64.i = getelementptr inbounds <8 x i32>, ptr %dst, i64 26
store <8 x i32> %t13, ptr %ptridx64.i, align 32		store <8 x i32> %t13, ptr %ptridx64.i, align 32
ret void		ret void
}		}

define <32 x float> @PR47534(<8 x float> %tmp) {		define <32 x float> @PR47534(<8 x float> %tmp) #0 {
; CHECK-LABEL: PR47534:		; CHECK-LABEL: PR47534:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vxorps %xmm2, %xmm2, %xmm2		; CHECK-NEXT: vxorps %xmm2, %xmm2, %xmm2
; CHECK-NEXT: vbroadcasti64x4 {{.*#+}} zmm1 = [7,25,26,27,7,29,30,31,7,25,26,27,7,29,30,31]		; CHECK-NEXT: vbroadcasti64x4 {{.*#+}} zmm1 = [7,25,26,27,7,29,30,31,7,25,26,27,7,29,30,31]
; CHECK-NEXT: # zmm1 = mem[0,1,2,3,0,1,2,3]		; CHECK-NEXT: # zmm1 = mem[0,1,2,3,0,1,2,3]
; CHECK-NEXT: vpermi2ps %zmm2, %zmm0, %zmm1		; CHECK-NEXT: vpermi2ps %zmm2, %zmm0, %zmm1
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%tmp1 = shufflevector <8 x float> %tmp, <8 x float> undef, <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%tmp1 = shufflevector <8 x float> %tmp, <8 x float> undef, <32 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = shufflevector <32 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <32 x float> undef, <32 x i32> <i32 39, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 29, i32 30, i32 31>		%tmp2 = shufflevector <32 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <32 x float> undef, <32 x i32> <i32 39, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 29, i32 30, i32 31>
%tmp18 = shufflevector <32 x float> %tmp2, <32 x float> %tmp1, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 39, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 29, i32 30, i32 31>		%tmp18 = shufflevector <32 x float> %tmp2, <32 x float> %tmp1, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 39, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 29, i32 30, i32 31>
ret <32 x float> %tmp18		ret <32 x float> %tmp18
}		}

%union1= type { <16 x float> }		%union1= type { <16 x float> }
@src1 = external dso_local local_unnamed_addr global %union1, align 64		@src1 = external dso_local local_unnamed_addr global %union1, align 64

define void @PR43170(ptr %a0) {		define void @PR43170(ptr %a0) #0 {
; SKX64-LABEL: PR43170:		; SKX64-LABEL: PR43170:
; SKX64: # %bb.0: # %entry		; SKX64: # %bb.0: # %entry
; SKX64-NEXT: vmovaps src1(%rip), %ymm0		; SKX64-NEXT: vmovaps src1(%rip), %ymm0
; SKX64-NEXT: vmovaps %zmm0, (%rdi)		; SKX64-NEXT: vmovaps %zmm0, (%rdi)
; SKX64-NEXT: vzeroupper		; SKX64-NEXT: vzeroupper
; SKX64-NEXT: retq		; SKX64-NEXT: retq
;		;
; KNL64-LABEL: PR43170:		; KNL64-LABEL: PR43170:
Show All 17 Lines
; KNL32-NEXT: vmovaps %zmm0, (%eax)		; KNL32-NEXT: vmovaps %zmm0, (%eax)
; KNL32-NEXT: retl		; KNL32-NEXT: retl
entry:		entry:
%0 = load <8 x float>, ptr @src1, align 64		%0 = load <8 x float>, ptr @src1, align 64
%1 = shufflevector <8 x float> %0, <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%1 = shufflevector <8 x float> %0, <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
store <16 x float> %1, ptr %a0, align 64		store <16 x float> %1, ptr %a0, align 64
ret void		ret void
}		}

		attributes #0 = { "min-legal-vector-width" = "512" }

llvm/test/CodeGen/X86/vector-trunc-usat.ll

Show All 13 Lines
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-crosslane-shuffle,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl,+fast-variable-perlane-shuffle \| FileCheck %s --check-prefixes=AVX512,AVX512BWVL
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx \| FileCheck %s --check-prefixes=SKX		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx \| FileCheck %s --check-prefixes=SKX

;		;
; Unsigned saturation truncation to vXi32		; Unsigned saturation truncation to vXi32
;		;

define <2 x i32> @trunc_usat_v2i64_v2i32(<2 x i64> %a0) {		define <2 x i32> @trunc_usat_v2i64_v2i32(<2 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i32:		; SSE2-LABEL: trunc_usat_v2i64_v2i32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259455,9223372039002259455]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259455,9223372039002259455]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovusqd %xmm0, %xmm0		; SKX-NEXT: vpmovusqd %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 4294967295, i64 4294967295>		%1 = icmp ult <2 x i64> %a0, <i64 4294967295, i64 4294967295>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>
%3 = trunc <2 x i64> %2 to <2 x i32>		%3 = trunc <2 x i64> %2 to <2 x i32>
ret <2 x i32> %3		ret <2 x i32> %3
}		}

define void @trunc_usat_v2i64_v2i32_store(<2 x i64> %a0, ptr %p1) {		define void @trunc_usat_v2i64_v2i32_store(<2 x i64> %a0, ptr %p1) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i32_store:		; SSE2-LABEL: trunc_usat_v2i64_v2i32_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259455,9223372039002259455]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259455,9223372039002259455]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 4294967295, i64 4294967295>		%1 = icmp ult <2 x i64> %a0, <i64 4294967295, i64 4294967295>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>
%3 = trunc <2 x i64> %2 to <2 x i32>		%3 = trunc <2 x i64> %2 to <2 x i32>
store <2 x i32> %3, ptr %p1		store <2 x i32> %3, ptr %p1
ret void		ret void
}		}

define <4 x i32> @trunc_usat_v4i64_v4i32(<4 x i64> %a0) {		define <4 x i32> @trunc_usat_v4i64_v4i32(<4 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v4i64_v4i32:		; SSE2-LABEL: trunc_usat_v4i64_v4i32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: pxor %xmm2, %xmm3		; SSE2-NEXT: pxor %xmm2, %xmm3
; SSE2-NEXT: pshufd {{.*#+}} xmm4 = xmm3[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm4 = xmm3[1,1,3,3]
; SSE2-NEXT: pcmpeqd %xmm2, %xmm4		; SSE2-NEXT: pcmpeqd %xmm2, %xmm4
; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [9223372039002259455,9223372039002259455]		; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [9223372039002259455,9223372039002259455]
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i64> %a0, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>		%1 = icmp ult <4 x i64> %a0, <i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 4294967295, i64 4294967295, i64 4294967295, i64 429496729>		%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 4294967295, i64 4294967295, i64 4294967295, i64 429496729>
%3 = trunc <4 x i64> %2 to <4 x i32>		%3 = trunc <4 x i64> %2 to <4 x i32>
ret <4 x i32> %3		ret <4 x i32> %3
}		}

define <8 x i32> @trunc_usat_v8i64_v8i32(ptr %p0) {		define <8 x i32> @trunc_usat_v8i64_v8i32(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v8i64_v8i32:		; SSE2-LABEL: trunc_usat_v8i64_v8i32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm2		; SSE2-NEXT: movdqa (%rdi), %xmm2
; SSE2-NEXT: movdqa 16(%rdi), %xmm5		; SSE2-NEXT: movdqa 16(%rdi), %xmm5
; SSE2-NEXT: movdqa 32(%rdi), %xmm6		; SSE2-NEXT: movdqa 32(%rdi), %xmm6
; SSE2-NEXT: movdqa 48(%rdi), %xmm1		; SSE2-NEXT: movdqa 48(%rdi), %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [4294967295,4294967295]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [4294967295,4294967295]
; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [9223372039002259456,9223372039002259456]
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
%3 = trunc <8 x i64> %2 to <8 x i32>		%3 = trunc <8 x i64> %2 to <8 x i32>
ret <8 x i32> %3		ret <8 x i32> %3
}		}

;		;
; Unsigned saturation truncation to vXi16		; Unsigned saturation truncation to vXi16
;		;

define <2 x i16> @trunc_usat_v2i64_v2i16(<2 x i64> %a0) {		define <2 x i16> @trunc_usat_v2i64_v2i16(<2 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i16:		; SSE2-LABEL: trunc_usat_v2i64_v2i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002324991,9223372039002324991]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002324991,9223372039002324991]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovusqw %xmm0, %xmm0		; SKX-NEXT: vpmovusqw %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 65535, i64 65535>		%1 = icmp ult <2 x i64> %a0, <i64 65535, i64 65535>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 65535, i64 65535>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 65535, i64 65535>
%3 = trunc <2 x i64> %2 to <2 x i16>		%3 = trunc <2 x i64> %2 to <2 x i16>
ret <2 x i16> %3		ret <2 x i16> %3
}		}

define void @trunc_usat_v2i64_v2i16_store(<2 x i64> %a0, ptr %p1) {		define void @trunc_usat_v2i64_v2i16_store(<2 x i64> %a0, ptr %p1) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i16_store:		; SSE2-LABEL: trunc_usat_v2i64_v2i16_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002324991,9223372039002324991]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002324991,9223372039002324991]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 65535, i64 65535>		%1 = icmp ult <2 x i64> %a0, <i64 65535, i64 65535>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 65535, i64 65535>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 65535, i64 65535>
%3 = trunc <2 x i64> %2 to <2 x i16>		%3 = trunc <2 x i64> %2 to <2 x i16>
store <2 x i16> %3, ptr %p1		store <2 x i16> %3, ptr %p1
ret void		ret void
}		}

define <4 x i16> @trunc_usat_v4i64_v4i16(<4 x i64> %a0) {		define <4 x i16> @trunc_usat_v4i64_v4i16(<4 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v4i64_v4i16:		; SSE2-LABEL: trunc_usat_v4i64_v4i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: movdqa %xmm0, %xmm4		; SSE2-NEXT: movdqa %xmm0, %xmm4
; SSE2-NEXT: pxor %xmm3, %xmm4		; SSE2-NEXT: pxor %xmm3, %xmm4
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]
; SSE2-NEXT: pcmpeqd %xmm3, %xmm5		; SSE2-NEXT: pcmpeqd %xmm3, %xmm5
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535>		%1 = icmp ult <4 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535>
%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 65535, i64 65535, i64 65535, i64 65535>		%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 65535, i64 65535, i64 65535, i64 65535>
%3 = trunc <4 x i64> %2 to <4 x i16>		%3 = trunc <4 x i64> %2 to <4 x i16>
ret <4 x i16> %3		ret <4 x i16> %3
}		}

define void @trunc_usat_v4i64_v4i16_store(<4 x i64> %a0, ptr%p1) {		define void @trunc_usat_v4i64_v4i16_store(<4 x i64> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v4i64_v4i16_store:		; SSE2-LABEL: trunc_usat_v4i64_v4i16_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [65535,65535]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: movdqa %xmm0, %xmm4		; SSE2-NEXT: movdqa %xmm0, %xmm4
; SSE2-NEXT: pxor %xmm3, %xmm4		; SSE2-NEXT: pxor %xmm3, %xmm4
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]
; SSE2-NEXT: pcmpeqd %xmm3, %xmm5		; SSE2-NEXT: pcmpeqd %xmm3, %xmm5
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535>		%1 = icmp ult <4 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535>
%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 65535, i64 65535, i64 65535, i64 65535>		%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 65535, i64 65535, i64 65535, i64 65535>
%3 = trunc <4 x i64> %2 to <4 x i16>		%3 = trunc <4 x i64> %2 to <4 x i16>
store <4 x i16> %3, ptr%p1		store <4 x i16> %3, ptr%p1
ret void		ret void
}		}

define <8 x i16> @trunc_usat_v8i64_v8i16(ptr %p0) {		define <8 x i16> @trunc_usat_v8i64_v8i16(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v8i64_v8i16:		; SSE2-LABEL: trunc_usat_v8i64_v8i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm4		; SSE2-NEXT: movdqa (%rdi), %xmm4
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movdqa 32(%rdi), %xmm6		; SSE2-NEXT: movdqa 32(%rdi), %xmm6
; SSE2-NEXT: movdqa 48(%rdi), %xmm7		; SSE2-NEXT: movdqa 48(%rdi), %xmm7
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [65535,65535]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [65535,65535]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <8 x i64>, ptr %p0		%a0 = load <8 x i64>, ptr %p0
%1 = icmp ult <8 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>		%1 = icmp ult <8 x i64> %a0, <i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>		%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
%3 = trunc <8 x i64> %2 to <8 x i16>		%3 = trunc <8 x i64> %2 to <8 x i16>
ret <8 x i16> %3		ret <8 x i16> %3
}		}

define <4 x i16> @trunc_usat_v4i32_v4i16(<4 x i32> %a0) {		define <4 x i16> @trunc_usat_v4i32_v4i16(<4 x i32> %a0) #0 {
; SSE2-LABEL: trunc_usat_v4i32_v4i16:		; SSE2-LABEL: trunc_usat_v4i32_v4i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147549183,2147549183,2147549183,2147549183]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147549183,2147549183,2147549183,2147549183]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2		; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovusdw %xmm0, %xmm0		; SKX-NEXT: vpmovusdw %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535>		%1 = icmp ult <4 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535>
%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535>		%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535>
%3 = trunc <4 x i32> %2 to <4 x i16>		%3 = trunc <4 x i32> %2 to <4 x i16>
ret <4 x i16> %3		ret <4 x i16> %3
}		}

define void @trunc_usat_v4i32_v4i16_store(<4 x i32> %a0, ptr%p1) {		define void @trunc_usat_v4i32_v4i16_store(<4 x i32> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v4i32_v4i16_store:		; SSE2-LABEL: trunc_usat_v4i32_v4i16_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147549183,2147549183,2147549183,2147549183]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147549183,2147549183,2147549183,2147549183]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2		; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535>		%1 = icmp ult <4 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535>
%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535>		%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535>
%3 = trunc <4 x i32> %2 to <4 x i16>		%3 = trunc <4 x i32> %2 to <4 x i16>
store <4 x i16> %3, ptr%p1		store <4 x i16> %3, ptr%p1
ret void		ret void
}		}

define <8 x i16> @trunc_usat_v8i32_v8i16(<8 x i32> %a0) {		define <8 x i16> @trunc_usat_v8i32_v8i16(<8 x i32> %a0) #0 {
; SSE2-LABEL: trunc_usat_v8i32_v8i16:		; SSE2-LABEL: trunc_usat_v8i32_v8i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: pxor %xmm2, %xmm3		; SSE2-NEXT: pxor %xmm2, %xmm3
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147549183,2147549183,2147549183,2147549183]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147549183,2147549183,2147549183,2147549183]
; SSE2-NEXT: movdqa %xmm4, %xmm5		; SSE2-NEXT: movdqa %xmm4, %xmm5
; SSE2-NEXT: pcmpgtd %xmm3, %xmm5		; SSE2-NEXT: pcmpgtd %xmm3, %xmm5
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <8 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		%1 = icmp ult <8 x i32> %a0, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
%3 = trunc <8 x i32> %2 to <8 x i16>		%3 = trunc <8 x i32> %2 to <8 x i16>
ret <8 x i16> %3		ret <8 x i16> %3
}		}

define <16 x i16> @trunc_usat_v16i32_v16i16(ptr %p0) {		define <16 x i16> @trunc_usat_v16i32_v16i16(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v16i32_v16i16:		; SSE2-LABEL: trunc_usat_v16i32_v16i16:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm5		; SSE2-NEXT: movdqa (%rdi), %xmm5
; SSE2-NEXT: movdqa 16(%rdi), %xmm4		; SSE2-NEXT: movdqa 16(%rdi), %xmm4
; SSE2-NEXT: movdqa 32(%rdi), %xmm0		; SSE2-NEXT: movdqa 32(%rdi), %xmm0
; SSE2-NEXT: movdqa 48(%rdi), %xmm8		; SSE2-NEXT: movdqa 48(%rdi), %xmm8
; SSE2-NEXT: movdqa {{.*#+}} xmm6 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm6 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
%3 = trunc <16 x i32> %2 to <16 x i16>		%3 = trunc <16 x i32> %2 to <16 x i16>
ret <16 x i16> %3		ret <16 x i16> %3
}		}

;		;
; Unsigned saturation truncation to vXi8		; Unsigned saturation truncation to vXi8
;		;

define <2 x i8> @trunc_usat_v2i64_v2i8(<2 x i64> %a0) {		define <2 x i8> @trunc_usat_v2i64_v2i8(<2 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i8:		; SSE2-LABEL: trunc_usat_v2i64_v2i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259711,9223372039002259711]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259711,9223372039002259711]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovusqb %xmm0, %xmm0		; SKX-NEXT: vpmovusqb %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 255, i64 255>		%1 = icmp ult <2 x i64> %a0, <i64 255, i64 255>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 255, i64 255>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 255, i64 255>
%3 = trunc <2 x i64> %2 to <2 x i8>		%3 = trunc <2 x i64> %2 to <2 x i8>
ret <2 x i8> %3		ret <2 x i8> %3
}		}

define void @trunc_usat_v2i64_v2i8_store(<2 x i64> %a0, ptr %p1) {		define void @trunc_usat_v2i64_v2i8_store(<2 x i64> %a0, ptr %p1) #0 {
; SSE2-LABEL: trunc_usat_v2i64_v2i8_store:		; SSE2-LABEL: trunc_usat_v2i64_v2i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259711,9223372039002259711]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259711,9223372039002259711]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[0,0,2,2]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <2 x i64> %a0, <i64 255, i64 255>		%1 = icmp ult <2 x i64> %a0, <i64 255, i64 255>
%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 255, i64 255>		%2 = select <2 x i1> %1, <2 x i64> %a0, <2 x i64> <i64 255, i64 255>
%3 = trunc <2 x i64> %2 to <2 x i8>		%3 = trunc <2 x i64> %2 to <2 x i8>
store <2 x i8> %3, ptr %p1		store <2 x i8> %3, ptr %p1
ret void		ret void
}		}

define <4 x i8> @trunc_usat_v4i64_v4i8(<4 x i64> %a0) {		define <4 x i8> @trunc_usat_v4i64_v4i8(<4 x i64> %a0) #0 {
; SSE2-LABEL: trunc_usat_v4i64_v4i8:		; SSE2-LABEL: trunc_usat_v4i64_v4i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: movdqa %xmm0, %xmm4		; SSE2-NEXT: movdqa %xmm0, %xmm4
; SSE2-NEXT: pxor %xmm3, %xmm4		; SSE2-NEXT: pxor %xmm3, %xmm4
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm4[1,1,3,3]
; SSE2-NEXT: pcmpeqd %xmm3, %xmm5		; SSE2-NEXT: pcmpeqd %xmm3, %xmm5
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i64> %a0, <i64 255, i64 255, i64 255, i64 255>		%1 = icmp ult <4 x i64> %a0, <i64 255, i64 255, i64 255, i64 255>
%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 255, i64 255, i64 255, i64 255>		%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 255, i64 255, i64 255, i64 255>
%3 = trunc <4 x i64> %2 to <4 x i8>		%3 = trunc <4 x i64> %2 to <4 x i8>
ret <4 x i8> %3		ret <4 x i8> %3
}		}

define void @trunc_usat_v4i64_v4i8_store(<4 x i64> %a0, ptr%p1) {		define void @trunc_usat_v4i64_v4i8_store(<4 x i64> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v4i64_v4i8_store:		; SSE2-LABEL: trunc_usat_v4i64_v4i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [9223372039002259456,9223372039002259456]
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: pxor %xmm4, %xmm3		; SSE2-NEXT: pxor %xmm4, %xmm3
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm3[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm3[1,1,3,3]
; SSE2-NEXT: pcmpeqd %xmm4, %xmm5		; SSE2-NEXT: pcmpeqd %xmm4, %xmm5
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i64> %a0, <i64 255, i64 255, i64 255, i64 255>		%1 = icmp ult <4 x i64> %a0, <i64 255, i64 255, i64 255, i64 255>
%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 255, i64 255, i64 255, i64 255>		%2 = select <4 x i1> %1, <4 x i64> %a0, <4 x i64> <i64 255, i64 255, i64 255, i64 255>
%3 = trunc <4 x i64> %2 to <4 x i8>		%3 = trunc <4 x i64> %2 to <4 x i8>
store <4 x i8> %3, ptr%p1		store <4 x i8> %3, ptr%p1
ret void		ret void
}		}

define <8 x i8> @trunc_usat_v8i64_v8i8(ptr %p0) {		define <8 x i8> @trunc_usat_v8i64_v8i8(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v8i64_v8i8:		; SSE2-LABEL: trunc_usat_v8i64_v8i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm6		; SSE2-NEXT: movdqa (%rdi), %xmm6
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movdqa 32(%rdi), %xmm1		; SSE2-NEXT: movdqa 32(%rdi), %xmm1
; SSE2-NEXT: movdqa 48(%rdi), %xmm5		; SSE2-NEXT: movdqa 48(%rdi), %xmm5
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [9223372039002259456,9223372039002259456]
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <8 x i64>, ptr %p0		%a0 = load <8 x i64>, ptr %p0
%1 = icmp ult <8 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%1 = icmp ult <8 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%3 = trunc <8 x i64> %2 to <8 x i8>		%3 = trunc <8 x i64> %2 to <8 x i8>
ret <8 x i8> %3		ret <8 x i8> %3
}		}

define void @trunc_usat_v8i64_v8i8_store(ptr %p0, ptr%p1) {		define void @trunc_usat_v8i64_v8i8_store(ptr %p0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v8i64_v8i8_store:		; SSE2-LABEL: trunc_usat_v8i64_v8i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm6		; SSE2-NEXT: movdqa (%rdi), %xmm6
; SSE2-NEXT: movdqa 16(%rdi), %xmm5		; SSE2-NEXT: movdqa 16(%rdi), %xmm5
; SSE2-NEXT: movdqa 32(%rdi), %xmm0		; SSE2-NEXT: movdqa 32(%rdi), %xmm0
; SSE2-NEXT: movdqa 48(%rdi), %xmm4		; SSE2-NEXT: movdqa 48(%rdi), %xmm4
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259456,9223372039002259456]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [9223372039002259456,9223372039002259456]
▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
%a0 = load <8 x i64>, ptr %p0		%a0 = load <8 x i64>, ptr %p0
%1 = icmp ult <8 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%1 = icmp ult <8 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%2 = select <8 x i1> %1, <8 x i64> %a0, <8 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%3 = trunc <8 x i64> %2 to <8 x i8>		%3 = trunc <8 x i64> %2 to <8 x i8>
store <8 x i8> %3, ptr%p1		store <8 x i8> %3, ptr%p1
ret void		ret void
}		}

define <16 x i8> @trunc_usat_v16i64_v16i8(ptr %p0) {		define <16 x i8> @trunc_usat_v16i64_v16i8(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v16i64_v16i8:		; SSE2-LABEL: trunc_usat_v16i64_v16i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa 96(%rdi), %xmm1		; SSE2-NEXT: movdqa 96(%rdi), %xmm1
; SSE2-NEXT: movdqa 112(%rdi), %xmm3		; SSE2-NEXT: movdqa 112(%rdi), %xmm3
; SSE2-NEXT: movdqa 64(%rdi), %xmm6		; SSE2-NEXT: movdqa 64(%rdi), %xmm6
; SSE2-NEXT: movdqa 80(%rdi), %xmm7		; SSE2-NEXT: movdqa 80(%rdi), %xmm7
; SSE2-NEXT: movdqa (%rdi), %xmm10		; SSE2-NEXT: movdqa (%rdi), %xmm10
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
▲ Show 20 Lines • Show All 446 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <16 x i64>, ptr %p0		%a0 = load <16 x i64>, ptr %p0
%1 = icmp ult <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%1 = icmp ult <16 x i64> %a0, <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>		%2 = select <16 x i1> %1, <16 x i64> %a0, <16 x i64> <i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
%3 = trunc <16 x i64> %2 to <16 x i8>		%3 = trunc <16 x i64> %2 to <16 x i8>
ret <16 x i8> %3		ret <16 x i8> %3
}		}

define <4 x i8> @trunc_usat_v4i32_v4i8(<4 x i32> %a0) {		define <4 x i8> @trunc_usat_v4i32_v4i8(<4 x i32> %a0) #0 {
; SSE2-LABEL: trunc_usat_v4i32_v4i8:		; SSE2-LABEL: trunc_usat_v4i32_v4i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483903,2147483903,2147483903,2147483903]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483903,2147483903,2147483903,2147483903]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2		; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovusdb %xmm0, %xmm0		; SKX-NEXT: vpmovusdb %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i32> %a0, <i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <4 x i32> %a0, <i32 255, i32 255, i32 255, i32 255>
%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 255, i32 255, i32 255, i32 255>		%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 255, i32 255, i32 255, i32 255>
%3 = trunc <4 x i32> %2 to <4 x i8>		%3 = trunc <4 x i32> %2 to <4 x i8>
ret <4 x i8> %3		ret <4 x i8> %3
}		}

define void @trunc_usat_v4i32_v4i8_store(<4 x i32> %a0, ptr%p1) {		define void @trunc_usat_v4i32_v4i8_store(<4 x i32> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v4i32_v4i8_store:		; SSE2-LABEL: trunc_usat_v4i32_v4i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm0, %xmm1		; SSE2-NEXT: pxor %xmm0, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483903,2147483903,2147483903,2147483903]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [2147483903,2147483903,2147483903,2147483903]
; SSE2-NEXT: pcmpgtd %xmm1, %xmm2		; SSE2-NEXT: pcmpgtd %xmm1, %xmm2
; SSE2-NEXT: pand %xmm2, %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2		; SSE2-NEXT: pandn {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <4 x i32> %a0, <i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <4 x i32> %a0, <i32 255, i32 255, i32 255, i32 255>
%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 255, i32 255, i32 255, i32 255>		%2 = select <4 x i1> %1, <4 x i32> %a0, <4 x i32> <i32 255, i32 255, i32 255, i32 255>
%3 = trunc <4 x i32> %2 to <4 x i8>		%3 = trunc <4 x i32> %2 to <4 x i8>
store <4 x i8> %3, ptr%p1		store <4 x i8> %3, ptr%p1
ret void		ret void
}		}

define <8 x i8> @trunc_usat_v8i32_v8i8(<8 x i32> %a0) {		define <8 x i8> @trunc_usat_v8i32_v8i8(<8 x i32> %a0) #0 {
; SSE2-LABEL: trunc_usat_v8i32_v8i8:		; SSE2-LABEL: trunc_usat_v8i32_v8i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm1, %xmm4		; SSE2-NEXT: movdqa %xmm1, %xmm4
; SSE2-NEXT: pxor %xmm3, %xmm4		; SSE2-NEXT: pxor %xmm3, %xmm4
; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [2147483903,2147483903,2147483903,2147483903]		; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [2147483903,2147483903,2147483903,2147483903]
; SSE2-NEXT: movdqa %xmm5, %xmm6		; SSE2-NEXT: movdqa %xmm5, %xmm6
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <8 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <8 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%3 = trunc <8 x i32> %2 to <8 x i8>		%3 = trunc <8 x i32> %2 to <8 x i8>
ret <8 x i8> %3		ret <8 x i8> %3
}		}

define void @trunc_usat_v8i32_v8i8_store(<8 x i32> %a0, ptr%p1) {		define void @trunc_usat_v8i32_v8i8_store(<8 x i32> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v8i32_v8i8_store:		; SSE2-LABEL: trunc_usat_v8i32_v8i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm1, %xmm4		; SSE2-NEXT: movdqa %xmm1, %xmm4
; SSE2-NEXT: pxor %xmm3, %xmm4		; SSE2-NEXT: pxor %xmm3, %xmm4
; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [2147483903,2147483903,2147483903,2147483903]		; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [2147483903,2147483903,2147483903,2147483903]
; SSE2-NEXT: movdqa %xmm5, %xmm6		; SSE2-NEXT: movdqa %xmm5, %xmm6
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <8 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <8 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%2 = select <8 x i1> %1, <8 x i32> %a0, <8 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%3 = trunc <8 x i32> %2 to <8 x i8>		%3 = trunc <8 x i32> %2 to <8 x i8>
store <8 x i8> %3, ptr%p1		store <8 x i8> %3, ptr%p1
ret void		ret void
}		}

define <16 x i8> @trunc_usat_v16i32_v16i8(ptr %p0) {		define <16 x i8> @trunc_usat_v16i32_v16i8(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v16i32_v16i8:		; SSE2-LABEL: trunc_usat_v16i32_v16i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm6		; SSE2-NEXT: movdqa (%rdi), %xmm6
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movdqa 32(%rdi), %xmm1		; SSE2-NEXT: movdqa 32(%rdi), %xmm1
; SSE2-NEXT: movdqa 48(%rdi), %xmm5		; SSE2-NEXT: movdqa 48(%rdi), %xmm5
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [255,255,255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147483648,2147483648,2147483648,2147483648]
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <16 x i32>, ptr %p0		%a0 = load <16 x i32>, ptr %p0
%1 = icmp ult <16 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <16 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%2 = select <16 x i1> %1, <16 x i32> %a0, <16 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%2 = select <16 x i1> %1, <16 x i32> %a0, <16 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%3 = trunc <16 x i32> %2 to <16 x i8>		%3 = trunc <16 x i32> %2 to <16 x i8>
ret <16 x i8> %3		ret <16 x i8> %3
}		}

define void @trunc_usat_v16i32_v16i8_store(ptr %p0, ptr %p1) {		define void @trunc_usat_v16i32_v16i8_store(ptr %p0, ptr %p1) #0 {
; SSE2-LABEL: trunc_usat_v16i32_v16i8_store:		; SSE2-LABEL: trunc_usat_v16i32_v16i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm6		; SSE2-NEXT: movdqa (%rdi), %xmm6
; SSE2-NEXT: movdqa 16(%rdi), %xmm5		; SSE2-NEXT: movdqa 16(%rdi), %xmm5
; SSE2-NEXT: movdqa 32(%rdi), %xmm0		; SSE2-NEXT: movdqa 32(%rdi), %xmm0
; SSE2-NEXT: movdqa 48(%rdi), %xmm4		; SSE2-NEXT: movdqa 48(%rdi), %xmm4
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255]
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
%a0 = load <16 x i32>, ptr %p0		%a0 = load <16 x i32>, ptr %p0
%1 = icmp ult <16 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <16 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%2 = select <16 x i1> %1, <16 x i32> %a0, <16 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%2 = select <16 x i1> %1, <16 x i32> %a0, <16 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%3 = trunc <16 x i32> %2 to <16 x i8>		%3 = trunc <16 x i32> %2 to <16 x i8>
store <16 x i8> %3, ptr %p1		store <16 x i8> %3, ptr %p1
ret void		ret void
}		}

define <8 x i8> @trunc_usat_v8i16_v8i8(<8 x i16> %a0) {		define <8 x i8> @trunc_usat_v8i16_v8i8(<8 x i16> %a0) #0 {
; SSE2-LABEL: trunc_usat_v8i16_v8i8:		; SSE2-LABEL: trunc_usat_v8i16_v8i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: psubusw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: psubusw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: psubw %xmm1, %xmm0		; SSE2-NEXT: psubw %xmm1, %xmm0
; SSE2-NEXT: packuswb %xmm0, %xmm0		; SSE2-NEXT: packuswb %xmm0, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; SKX-NEXT: vpmovuswb %xmm0, %xmm0		; SKX-NEXT: vpmovuswb %xmm0, %xmm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <8 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%1 = icmp ult <8 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%2 = select <8 x i1> %1, <8 x i16> %a0, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%2 = select <8 x i1> %1, <8 x i16> %a0, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%3 = trunc <8 x i16> %2 to <8 x i8>		%3 = trunc <8 x i16> %2 to <8 x i8>
ret <8 x i8> %3		ret <8 x i8> %3
}		}

define void @trunc_usat_v8i16_v8i8_store(<8 x i16> %a0, ptr%p1) {		define void @trunc_usat_v8i16_v8i8_store(<8 x i16> %a0, ptr%p1) #0 {
; SSE2-LABEL: trunc_usat_v8i16_v8i8_store:		; SSE2-LABEL: trunc_usat_v8i16_v8i8_store:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: psubusw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: psubusw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: psubw %xmm1, %xmm0		; SSE2-NEXT: psubw %xmm1, %xmm0
; SSE2-NEXT: packuswb %xmm0, %xmm0		; SSE2-NEXT: packuswb %xmm0, %xmm0
; SSE2-NEXT: movq %xmm0, (%rdi)		; SSE2-NEXT: movq %xmm0, (%rdi)
; SSE2-NEXT: retq		; SSE2-NEXT: retq
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <8 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%1 = icmp ult <8 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%2 = select <8 x i1> %1, <8 x i16> %a0, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%2 = select <8 x i1> %1, <8 x i16> %a0, <8 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%3 = trunc <8 x i16> %2 to <8 x i8>		%3 = trunc <8 x i16> %2 to <8 x i8>
store <8 x i8> %3, ptr%p1		store <8 x i8> %3, ptr%p1
ret void		ret void
}		}

define <16 x i8> @trunc_usat_v16i16_v16i8(<16 x i16> %a0) {		define <16 x i8> @trunc_usat_v16i16_v16i8(<16 x i16> %a0) #0 {
; SSE2-LABEL: trunc_usat_v16i16_v16i8:		; SSE2-LABEL: trunc_usat_v16i16_v16i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: movdqa %xmm1, %xmm3		; SSE2-NEXT: movdqa %xmm1, %xmm3
; SSE2-NEXT: psubusw %xmm2, %xmm3		; SSE2-NEXT: psubusw %xmm2, %xmm3
; SSE2-NEXT: psubw %xmm3, %xmm1		; SSE2-NEXT: psubw %xmm3, %xmm1
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: psubusw %xmm2, %xmm3		; SSE2-NEXT: psubusw %xmm2, %xmm3
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
; SKX-NEXT: vzeroupper		; SKX-NEXT: vzeroupper
; SKX-NEXT: retq		; SKX-NEXT: retq
%1 = icmp ult <16 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%1 = icmp ult <16 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%2 = select <16 x i1> %1, <16 x i16> %a0, <16 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%2 = select <16 x i1> %1, <16 x i16> %a0, <16 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%3 = trunc <16 x i16> %2 to <16 x i8>		%3 = trunc <16 x i16> %2 to <16 x i8>
ret <16 x i8> %3		ret <16 x i8> %3
}		}

define <32 x i8> @trunc_usat_v32i16_v32i8(ptr %p0) {		define <32 x i8> @trunc_usat_v32i16_v32i8(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v32i16_v32i8:		; SSE2-LABEL: trunc_usat_v32i16_v32i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm0		; SSE2-NEXT: movdqa (%rdi), %xmm0
; SSE2-NEXT: movdqa 16(%rdi), %xmm2		; SSE2-NEXT: movdqa 16(%rdi), %xmm2
; SSE2-NEXT: movdqa 32(%rdi), %xmm1		; SSE2-NEXT: movdqa 32(%rdi), %xmm1
; SSE2-NEXT: movdqa 48(%rdi), %xmm3		; SSE2-NEXT: movdqa 48(%rdi), %xmm3
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: movdqa %xmm3, %xmm5		; SSE2-NEXT: movdqa %xmm3, %xmm5
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <32 x i16>, ptr %p0		%a0 = load <32 x i16>, ptr %p0
%1 = icmp ult <32 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%1 = icmp ult <32 x i16> %a0, <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%2 = select <32 x i1> %1, <32 x i16> %a0, <32 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		%2 = select <32 x i1> %1, <32 x i16> %a0, <32 x i16> <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
%3 = trunc <32 x i16> %2 to <32 x i8>		%3 = trunc <32 x i16> %2 to <32 x i8>
ret <32 x i8> %3		ret <32 x i8> %3
}		}

define <32 x i8> @trunc_usat_v32i32_v32i8(ptr %p0) {		define <32 x i8> @trunc_usat_v32i32_v32i8(ptr %p0) #0 {
; SSE2-LABEL: trunc_usat_v32i32_v32i8:		; SSE2-LABEL: trunc_usat_v32i32_v32i8:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm7		; SSE2-NEXT: movdqa (%rdi), %xmm7
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movdqa 32(%rdi), %xmm2		; SSE2-NEXT: movdqa 32(%rdi), %xmm2
; SSE2-NEXT: movdqa 48(%rdi), %xmm5		; SSE2-NEXT: movdqa 48(%rdi), %xmm5
; SSE2-NEXT: movdqa 96(%rdi), %xmm8		; SSE2-NEXT: movdqa 96(%rdi), %xmm8
; SSE2-NEXT: movdqa 112(%rdi), %xmm9		; SSE2-NEXT: movdqa 112(%rdi), %xmm9
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
; SKX-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0		; SKX-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%a0 = load <32 x i32>, ptr %p0		%a0 = load <32 x i32>, ptr %p0
%1 = icmp ult <32 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%1 = icmp ult <32 x i32> %a0, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%2 = select <32 x i1> %1, <32 x i32> %a0, <32 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>		%2 = select <32 x i1> %1, <32 x i32> %a0, <32 x i32> <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
%3 = trunc <32 x i32> %2 to <32 x i8>		%3 = trunc <32 x i32> %2 to <32 x i8>
ret <32 x i8> %3		ret <32 x i8> %3
}		}

		attributes #0 = { "min-legal-vector-width" = "512" }

This is an archive of the discontinued LLVM Phabricator instance.

clang/X86: Don't emit "min-legal-vector-width"="0"Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 481256

clang/lib/CodeGen/CodeGenFunction.cpp

clang/test/CodeGen/aarch64-neon-ldst-one.c

clang/test/CodeGen/aarch64-neon-scalar-x-indexed-elem.c

clang/test/CodeGen/aarch64-poly128.c

clang/test/CodeGen/aarch64-poly64.c

clang/test/CodeGen/regcall2.c

clang/test/CodeGenCXX/arm-generated-fn-attr.cpp

clang/test/CodeGenCXX/dllexport-ctor-closure-nested.cpp

clang/test/CodeGenCXX/dllexport-ctor-closure.cpp

clang/test/CodeGenCXX/dllexport.cpp

clang/test/OpenMP/amdgcn-attributes.cpp

clang/test/OpenMP/irbuilder_safelen.cpp

clang/test/OpenMP/irbuilder_safelen_order_concurrent.cpp

clang/test/OpenMP/irbuilder_simd_aligned.cpp

clang/test/OpenMP/irbuilder_simdlen.cpp

clang/test/OpenMP/irbuilder_simdlen_safelen.cpp

llvm/lib/Target/X86/X86TargetMachine.cpp

llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll

llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll

llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll

llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll

llvm/test/Analysis/CostModel/X86/powi.ll

llvm/test/CodeGen/X86/avx512-calling-conv.ll

llvm/test/CodeGen/X86/avx512bw-mask-op.ll

llvm/test/CodeGen/X86/avx512fp16-subv-broadcast-fp16.ll

llvm/test/CodeGen/X86/perm.avx512-false-deps.ll

llvm/test/CodeGen/X86/pr47299.ll

llvm/test/CodeGen/X86/pr48727.ll

llvm/test/CodeGen/X86/vector-shuffle-avx512.ll

llvm/test/CodeGen/X86/vector-trunc-usat.ll

clang/X86: Don't emit "min-legal-vector-width"="0"
Needs RevisionPublic