This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM
ClosedPublic

Authored by tkrupa on Dec 27 2017, 6:57 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon
DavidKreitzer
uriel.k

Commits

rGbcaab53d479e: [X86] Lowering sqrt intrinsics to native IR
rL334849: [X86] Lowering sqrt intrinsics to native IR

Summary

Together with a matching clang patch, lowering the sqrt intrinsics.
Notice that for the scalar type there is another move instruction which should be removed by another patch.

This patch removes the sqrt intrinsics and give support AutoUpgrade.cpp for backward compatibility.

Diff Detail

Repository: rL LLVM

Event Timeline

uriel.k created this revision.Dec 27 2017, 6:57 AM

Won't this mean that explicit calls to the SSE sqrt intrinsics may be converted to the rsqrt+NR estimates in some cases?

RKSimon added inline comments.Dec 28 2017, 8:36 AM

test/CodeGen/X86/sse-intrinsics-x86.ll
476 ↗	(On Diff #128233)	Why did you move this test?
test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
2954 ↗	(On Diff #128233)	Shouldn't that be llvm.sqrt.v2f64?
test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
20 ↗	(On Diff #128233)	Strip these checks and regenerate

uriel.k updated this revision to Diff 128383.Jan 1 2018, 1:12 AM

uriel.k marked 3 inline comments as done.

In D41599#964739, @RKSimon wrote:

Won't this mean that explicit calls to the SSE sqrt intrinsics may be converted to the rsqrt+NR estimates in some cases?

Yes, this is expected as that's what we are aiming by lowering the intrinsics to IR code, we want the compiler to make a better decision, to get better performance.
Correct me if miss something special about this intrinsic.

test/CodeGen/X86/sse-intrinsics-x86.ll
476 ↗	(On Diff #128233)	You are right, my mistake. fixed.
test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
2954 ↗	(On Diff #128233)	fixed.

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

In D41599#976062, @uriel.k wrote:

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

I'm still a little worried about this - it can create a lot more bit differences in results than previous other intrinsics where we've replaced with generic implementations - I guess _mm_div_ps already does this to an extent (and other fadd/fsub/fmul cases via re-association etc.).

Maybe I'm just being a little over cautious, but at very least I'd like to see D41168 update the intrinsic documentation to explain that -ffast-math may result in rsqrt+nr codgen under some circumstances - it still says that (v)sqrtps will be generated.

In D41599#976338, @RKSimon wrote:

In D41599#976062, @uriel.k wrote:

Simon, is there anything else you think that is needed to be changed before accepting the revision?
Thanks

I'm still a little worried about this - it can create a lot more bit differences in results than previous other intrinsics where we've replaced with generic implementations - I guess _mm_div_ps already does this to an extent (and other fadd/fsub/fmul cases via re-association etc.).

Maybe I'm just being a little over cautious, but at very least I'd like to see D41168 update the intrinsic documentation to explain that -ffast-math may result in rsqrt+nr codgen under some circumstances - it still says that (v)sqrtps will be generated.

We want to allow more transforms with -ffast-math regardless of whether the source used intrinsics, so we shouldn’t treat sqrt differently than other math ops.

That said, updating the clang header docs is necessary too. We’re almost certainly going to see fallout from this change and get bug reports.
One nit: these patches are titled ‘avx512’ when they apply to all avx/sse. When committing, I recommend splitting these patches up by intrinsic (if not individually, then at least by sse/avx/avx512). This will reduce the risk that the whole thing gets reverted…that will also make it easier to pinpoint exactly which intrinsic is under investigation when the complaints come in.

igorb added a reviewer: DavidKreitzer.Jan 17 2018, 4:58 AM

Looking at clang's CGBuiltin.cpp we do have precedent for using Intrinsic::sqrt for builtins for AArch64, PowerPC, and SystemZ.

include/llvm/IR/IntrinsicsX86.td
4497 ↗	(On Diff #128383)	Why are we renaming intrinsics here? Is this done to purposely exclude the AVX512 intrinsics? Why are we doing that?

mike.dvoretsky added a subscriber: mike.dvoretsky.Jan 29 2018, 7:29 AM

mike.dvoretsky added inline comments.

include/llvm/IR/IntrinsicsX86.td
4497 ↗	(On Diff #128383)	It seems to me that this is done to avoid unconditionally generating the intrinsic in CodeGenFunction::EmitBuiltinExpr in CGBuiltin.cpp on the clang side while keeping the intrinsic available in IR for cases where the rounding mode isn't 4 and it's not being lowered. I haven't been able to find other intrinsic-lowering patches that take measures to keep the intrinsic available rather than just deleting it from IR, so I can't say if this change is conventional. If it isn't, then we need to either look into changing the algorithm in EmitBuiltinExpr to check for lowering before checking if llvm supports the intrinsic, or propose a renaming convention for cases like this one. In the latter case I would propose to put "nonlowered" in the names after the target prefix to keep these distinguishable as renamed, rather than aiming for a similar name and confusing people.

mike.dvoretsky added inline comments.Feb 7 2018, 9:16 AM

include/llvm/IR/IntrinsicsX86.td
4497 ↗	(On Diff #128383)	Looks like a better method to preserve the intrinsics exists for this case. Instead of renaming them, one may simply remove the GCCBuiltin template from their def's here and leave them untouched in X86IntrinsicsInfo.h. That method should be made conventional for patches like this and D41168.

@uriel.k Sorry, this patch and D41168 fell off my radar for a while - are you still looking at this?

In D41599#1041200, @RKSimon wrote:

@uriel.k Sorry, this patch and D41168 fell off my radar for a while - are you still looking at this?

Sorry, I've been away for a while.
There will be someone else replacing me.
i'm not sure if he will continue this revision or start a new one, so for now I leave this open.

Thanks for asking

Uriel

tkrupa added a subscriber: tkrupa.Mar 30 2018, 2:23 AM

I was assigned to finish this task. Is it possible to set me as an author of this ticket or do I need to open a new one?

In D41599#1052744, @tkrupa wrote:

I was assigned to finish this task. Is it possible to set me as an author of this ticket or do I need to open a new one?

You can commandeer this patch to set you as the author - look under the 'Add Action...' tab on Phabricator

tkrupa commandeered this revision.Mar 30 2018, 4:28 AM

tkrupa added a reviewer: uriel.k.

As mike.dvoretsky suggested, I reversed renaming of those 4 round intrinsics. Instead, I removed the binding to gcc builtins in IntrinsicsX86.td. This way, they don't get lowered in AutoUpgrade.cpp but new code emitted with clang still gets lowered. Besides that, I changed comments in AutoUpgrade.cpp from "Added 6.0" to "Added in 7.0".

craig.topper added inline comments.Apr 4 2018, 10:21 AM

test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4 ↗	(On Diff #140404)	This patch doesn't appear to have removed the scalar intrinsics. I dont' see any AutoUpgrade code or removal from X86InstrinsicsInfo.h

tkrupa added inline comments.Apr 5 2018, 1:42 AM

test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4 ↗	(On Diff #140404)	You're right, they only get lowered in clang. I gave the reasoning in the comment to the last upload. Is it enough to just move these 4 tests back to test/CodeGen/X86/avx512-intrinsics.ll or is it crucial to also lower them in LLVM part?

craig.topper added inline comments.Apr 5 2018, 10:04 AM

test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4 ↗	(On Diff #140404)	You can just move them back. But if clang isn't using them, make sure the GCCBuiltin is removed from IntrinsicsX86.td and leave a FIXME saying that they can be removed. There are lot of FIXMEs like that in that file already.

tkrupa updated this revision to Diff 141287.Apr 6 2018, 12:47 AM

tkrupa marked an inline comment as done.

craig.topper added inline comments.Apr 6 2018, 9:40 AM

include/llvm/IR/IntrinsicsX86.td
4437 ↗	(On Diff #141287)	Isn't clang still using this one when the rounding mode is non-default?
4440 ↗	(On Diff #141287)	Same with this one?

tkrupa added inline comments.Apr 8 2018, 11:52 PM

include/llvm/IR/IntrinsicsX86.td
4437 ↗	(On Diff #141287)	It does, ss and sd intrinsics also do. The GCCBuiltin binds needed to be removed to enable lowering in AutoUpgrade but yeah, these definitions should stay. Is erasing the FIXME enough or should there be some note to not remove them?

craig.topper added inline comments.Apr 9 2018, 9:42 AM

include/llvm/IR/IntrinsicsX86.td
4437 ↗	(On Diff #141287)	Removing the FIXME should be enough. If anyone tries to delete it, they'll get a build error in clang.

Removed FIXME annotations.

Harbormaster completed remote builds in B16974: Diff 142013.Apr 11 2018, 8:07 AM

LGTM

This revision is now accepted and ready to land.Apr 11 2018, 10:24 AM

Added lowering of scalar sqrt intrinsics without rounding (relies on D47621).

craig.topper added inline comments.Jun 3 2018, 3:28 PM

lib/IR/AutoUpgrade.cpp
100 ↗	(On Diff #149416)	The sse.sqrt.ss and sse2.sqrt.sd intrinsics are still in IntrinsicsX86.td

tkrupa updated this revision to Diff 149925.Jun 5 2018, 3:38 AM

tkrupa marked an inline comment as done.

LGTM

This revision is now accepted and ready to land.Jun 8 2018, 9:25 AM

Closed by commit rL334849: [X86] Lowering sqrt intrinsics to native IR (authored by tkrupa). · Explain WhyJun 15 2018, 11:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsX86.td

37 lines

lib/

IR/

AutoUpgrade.cpp

32 lines

Target/

X86/

X86InstrAVX512.td

23 lines

X86InstrSSE.td

86 lines

X86IntrinsicsInfo.h

8 lines

Transforms/

InstCombine/

InstCombineSimplifyDemanded.cpp

2 lines

test/

CodeGen/

X86/

avx-intrinsics-fast-isel.ll

16 lines

avx-intrinsics-x86-upgrade.ll

30 lines

avx-intrinsics-x86.ll

33 lines

avx512vl-intrinsics-fast-isel.ll

176 lines

avx512vl-intrinsics-upgrade.ll

37 lines

avx512vl-intrinsics.ll

37 lines

fold-load-unops.ll

12 lines

sse-intrinsics-fast-isel.ll

12 lines

sse-intrinsics-x86-upgrade.ll

38 lines

sse-intrinsics-x86.ll

42 lines

sse-scalar-fp-arith.ll

24 lines

sse2-intrinsics-fast-isel.ll

16 lines

sse2-intrinsics-x86-upgrade.ll

101 lines

sse2-intrinsics-x86.ll

87 lines

sse_partial_update.ll

10 lines

Transforms/

InstCombine/

X86/

x86-sse.ll

6 lines

x86-sse2.ll

6 lines

Diff 151536

llvm/trunk/include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	def int_x86_3dnowa_pswapd :
Intrinsic<[llvm_x86mmx_ty], [llvm_x86mmx_ty], [IntrNoMem]>;		Intrinsic<[llvm_x86mmx_ty], [llvm_x86mmx_ty], [IntrNoMem]>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SSE1		// SSE1

// Arithmetic ops		// Arithmetic ops
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse_sqrt_ss : GCCBuiltin<"__builtin_ia32_sqrtss">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],
[IntrNoMem]>;
def int_x86_sse_sqrt_ps : GCCBuiltin<"__builtin_ia32_sqrtps">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],
[IntrNoMem]>;
def int_x86_sse_rcp_ss : GCCBuiltin<"__builtin_ia32_rcpss">,		def int_x86_sse_rcp_ss : GCCBuiltin<"__builtin_ia32_rcpss">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],		Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_x86_sse_rcp_ps : GCCBuiltin<"__builtin_ia32_rcpps">,		def int_x86_sse_rcp_ps : GCCBuiltin<"__builtin_ia32_rcpps">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],		Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_x86_sse_rsqrt_ss : GCCBuiltin<"__builtin_ia32_rsqrtss">,		def int_x86_sse_rsqrt_ss : GCCBuiltin<"__builtin_ia32_rsqrtss">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],		Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty],
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	def int_x86_sse_movmsk_ps : GCCBuiltin<"__builtin_ia32_movmskps">,
Intrinsic<[llvm_i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;		Intrinsic<[llvm_i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SSE2		// SSE2

// FP arithmetic ops		// FP arithmetic ops
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse2_sqrt_sd : GCCBuiltin<"__builtin_ia32_sqrtsd">,
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty],
[IntrNoMem]>;
def int_x86_sse2_sqrt_pd : GCCBuiltin<"__builtin_ia32_sqrtpd">,
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty],
[IntrNoMem]>;
def int_x86_sse2_min_sd : GCCBuiltin<"__builtin_ia32_minsd">,		def int_x86_sse2_min_sd : GCCBuiltin<"__builtin_ia32_minsd">,
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,		Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,
llvm_v2f64_ty], [IntrNoMem]>;		llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_min_pd : GCCBuiltin<"__builtin_ia32_minpd">,		def int_x86_sse2_min_pd : GCCBuiltin<"__builtin_ia32_minpd">,
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,		Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,
llvm_v2f64_ty], [IntrNoMem]>;		llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_max_sd : GCCBuiltin<"__builtin_ia32_maxsd">,		def int_x86_sse2_max_sd : GCCBuiltin<"__builtin_ia32_maxsd">,
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,		Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty,
▲ Show 20 Lines • Show All 635 Lines • ▼ Show 20 Lines	def int_x86_avx_max_ps_256 : GCCBuiltin<"__builtin_ia32_maxps256">,
llvm_v8f32_ty], [IntrNoMem]>;		llvm_v8f32_ty], [IntrNoMem]>;
def int_x86_avx_min_pd_256 : GCCBuiltin<"__builtin_ia32_minpd256">,		def int_x86_avx_min_pd_256 : GCCBuiltin<"__builtin_ia32_minpd256">,
Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty,		Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty,
llvm_v4f64_ty], [IntrNoMem]>;		llvm_v4f64_ty], [IntrNoMem]>;
def int_x86_avx_min_ps_256 : GCCBuiltin<"__builtin_ia32_minps256">,		def int_x86_avx_min_ps_256 : GCCBuiltin<"__builtin_ia32_minps256">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty,		Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty,
llvm_v8f32_ty], [IntrNoMem]>;		llvm_v8f32_ty], [IntrNoMem]>;

def int_x86_avx_sqrt_pd_256 : GCCBuiltin<"__builtin_ia32_sqrtpd256">,
Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty], [IntrNoMem]>;
def int_x86_avx_sqrt_ps_256 : GCCBuiltin<"__builtin_ia32_sqrtps256">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty], [IntrNoMem]>;

def int_x86_avx_rsqrt_ps_256 : GCCBuiltin<"__builtin_ia32_rsqrtps256">,		def int_x86_avx_rsqrt_ps_256 : GCCBuiltin<"__builtin_ia32_rsqrtps256">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty], [IntrNoMem]>;		Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty], [IntrNoMem]>;

def int_x86_avx_rcp_ps_256 : GCCBuiltin<"__builtin_ia32_rcpps256">,		def int_x86_avx_rcp_ps_256 : GCCBuiltin<"__builtin_ia32_rcpps256">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty], [IntrNoMem]>;		Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty], [IntrNoMem]>;

def int_x86_avx_round_pd_256 : GCCBuiltin<"__builtin_ia32_roundpd256">,		def int_x86_avx_round_pd_256 : GCCBuiltin<"__builtin_ia32_roundpd256">,
Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty,		Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty,
▲ Show 20 Lines • Show All 2,886 Lines • ▼ Show 20 Lines	def int_x86_avx512_mask_scalef_ps_128 : GCCBuiltin<"__builtin_ia32_scalefps128_mask">,
llvm_v4f32_ty, llvm_i8_ty], [IntrNoMem]>;		llvm_v4f32_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_scalef_ps_256 : GCCBuiltin<"__builtin_ia32_scalefps256_mask">,		def int_x86_avx512_mask_scalef_ps_256 : GCCBuiltin<"__builtin_ia32_scalefps256_mask">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty, llvm_v8f32_ty,		Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty, llvm_v8f32_ty,
llvm_v8f32_ty, llvm_i8_ty], [IntrNoMem]>;		llvm_v8f32_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_scalef_ps_512 : GCCBuiltin<"__builtin_ia32_scalefps512_mask">,		def int_x86_avx512_mask_scalef_ps_512 : GCCBuiltin<"__builtin_ia32_scalefps512_mask">,
Intrinsic<[llvm_v16f32_ty], [llvm_v16f32_ty, llvm_v16f32_ty,		Intrinsic<[llvm_v16f32_ty], [llvm_v16f32_ty, llvm_v16f32_ty,
llvm_v16f32_ty, llvm_i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_v16f32_ty, llvm_i16_ty, llvm_i32_ty], [IntrNoMem]>;

def int_x86_avx512_mask_sqrt_ss : GCCBuiltin<"__builtin_ia32_sqrtss_round_mask">,		def int_x86_avx512_mask_sqrt_ss :
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_v4f32_ty, llvm_v4f32_ty,		Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_v4f32_ty, llvm_v4f32_ty,
llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_sd : GCCBuiltin<"__builtin_ia32_sqrtsd_round_mask">,		def int_x86_avx512_mask_sqrt_sd :
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty, llvm_v2f64_ty,		Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty, llvm_v2f64_ty,
llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;

def int_x86_avx512_mask_sqrt_pd_128 : GCCBuiltin<"__builtin_ia32_sqrtpd128_mask">,		def int_x86_avx512_mask_sqrt_pd_512 :
Intrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty,
llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_pd_256 : GCCBuiltin<"__builtin_ia32_sqrtpd256_mask">,
Intrinsic<[llvm_v4f64_ty], [llvm_v4f64_ty, llvm_v4f64_ty,
llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_pd_512 : GCCBuiltin<"__builtin_ia32_sqrtpd512_mask">,
Intrinsic<[llvm_v8f64_ty], [llvm_v8f64_ty, llvm_v8f64_ty,		Intrinsic<[llvm_v8f64_ty], [llvm_v8f64_ty, llvm_v8f64_ty,
llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_i8_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_ps_128 : GCCBuiltin<"__builtin_ia32_sqrtps128_mask">,		def int_x86_avx512_mask_sqrt_ps_512 :
Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_v4f32_ty,
llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_ps_256 : GCCBuiltin<"__builtin_ia32_sqrtps256_mask">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8f32_ty, llvm_v8f32_ty,
llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_sqrt_ps_512 : GCCBuiltin<"__builtin_ia32_sqrtps512_mask">,
Intrinsic<[llvm_v16f32_ty], [llvm_v16f32_ty, llvm_v16f32_ty,		Intrinsic<[llvm_v16f32_ty], [llvm_v16f32_ty, llvm_v16f32_ty,
llvm_i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_fixupimm_pd_128 :		def int_x86_avx512_mask_fixupimm_pd_128 :
GCCBuiltin<"__builtin_ia32_fixupimmpd128_mask">,		GCCBuiltin<"__builtin_ia32_fixupimmpd128_mask">,
Intrinsic<[llvm_v2f64_ty],		Intrinsic<[llvm_v2f64_ty],
[llvm_v2f64_ty, llvm_v2f64_ty, llvm_v2i64_ty, llvm_i32_ty, llvm_i8_ty],		[llvm_v2f64_ty, llvm_v2f64_ty, llvm_v2i64_ty, llvm_i32_ty, llvm_i8_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_x86_avx512_maskz_fixupimm_pd_128 :		def int_x86_avx512_maskz_fixupimm_pd_128 :
▲ Show 20 Lines • Show All 1,776 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	if (Name=="ssse3.pabs.b.128" \|\| // Added in 6.0
Name.startswith("fma.vfnmadd.") \|\| // Added in 7.0		Name.startswith("fma.vfnmadd.") \|\| // Added in 7.0
Name.startswith("fma.vfnmsub.") \|\| // Added in 7.0		Name.startswith("fma.vfnmsub.") \|\| // Added in 7.0
Name.startswith("avx512.mask.shuf.i") \|\| // Added in 6.0		Name.startswith("avx512.mask.shuf.i") \|\| // Added in 6.0
Name.startswith("avx512.mask.shuf.f") \|\| // Added in 6.0		Name.startswith("avx512.mask.shuf.f") \|\| // Added in 6.0
Name.startswith("avx512.kunpck") \|\| //added in 6.0		Name.startswith("avx512.kunpck") \|\| //added in 6.0
Name.startswith("avx2.pabs.") \|\| // Added in 6.0		Name.startswith("avx2.pabs.") \|\| // Added in 6.0
Name.startswith("avx512.mask.pabs.") \|\| // Added in 6.0		Name.startswith("avx512.mask.pabs.") \|\| // Added in 6.0
Name.startswith("avx512.broadcastm") \|\| // Added in 6.0		Name.startswith("avx512.broadcastm") \|\| // Added in 6.0
		Name == "sse.sqrt.ss" \|\| // Added in 7.0
		Name == "sse2.sqrt.sd" \|\| // Added in 7.0
		Name == "avx512.mask.sqrt.ps.128" \|\| // Added in 7.0
		Name == "avx512.mask.sqrt.ps.256" \|\| // Added in 7.0
		Name == "avx512.mask.sqrt.pd.128" \|\| // Added in 7.0
		Name == "avx512.mask.sqrt.pd.256" \|\| // Added in 7.0
		Name.startswith("avx.sqrt.p") \|\| // Added in 7.0
		Name.startswith("sse2.sqrt.p") \|\| // Added in 7.0
		Name.startswith("sse.sqrt.p") \|\| // Added in 7.0
Name.startswith("avx512.mask.pbroadcast") \|\| // Added in 6.0		Name.startswith("avx512.mask.pbroadcast") \|\| // Added in 6.0
Name.startswith("sse2.pcmpeq.") \|\| // Added in 3.1		Name.startswith("sse2.pcmpeq.") \|\| // Added in 3.1
Name.startswith("sse2.pcmpgt.") \|\| // Added in 3.1		Name.startswith("sse2.pcmpgt.") \|\| // Added in 3.1
Name.startswith("avx2.pcmpeq.") \|\| // Added in 3.1		Name.startswith("avx2.pcmpeq.") \|\| // Added in 3.1
Name.startswith("avx2.pcmpgt.") \|\| // Added in 3.1		Name.startswith("avx2.pcmpgt.") \|\| // Added in 3.1
Name.startswith("avx512.mask.pcmpeq.") \|\| // Added in 3.9		Name.startswith("avx512.mask.pcmpeq.") \|\| // Added in 3.9
Name.startswith("avx512.mask.pcmpgt.") \|\| // Added in 3.9		Name.startswith("avx512.mask.pcmpgt.") \|\| // Added in 3.9
Name.startswith("avx.vperm2f128.") \|\| // Added in 6.0		Name.startswith("avx.vperm2f128.") \|\| // Added in 6.0
▲ Show 20 Lines • Show All 1,362 Lines • ▼ Show 20 Lines	if (!NewFn) {
} else if (IsX86 && (Name.startswith("avx512.broadcastm"))) {		} else if (IsX86 && (Name.startswith("avx512.broadcastm"))) {
Type *ExtTy = Type::getInt32Ty(C);		Type *ExtTy = Type::getInt32Ty(C);
if (CI->getOperand(0)->getType()->isIntegerTy(8))		if (CI->getOperand(0)->getType()->isIntegerTy(8))
ExtTy = Type::getInt64Ty(C);		ExtTy = Type::getInt64Ty(C);
unsigned NumElts = CI->getType()->getPrimitiveSizeInBits() /		unsigned NumElts = CI->getType()->getPrimitiveSizeInBits() /
ExtTy->getPrimitiveSizeInBits();		ExtTy->getPrimitiveSizeInBits();
Rep = Builder.CreateZExt(CI->getArgOperand(0), ExtTy);		Rep = Builder.CreateZExt(CI->getArgOperand(0), ExtTy);
Rep = Builder.CreateVectorSplat(NumElts, Rep);		Rep = Builder.CreateVectorSplat(NumElts, Rep);
		} else if (IsX86 && (Name == "sse.sqrt.ss" \|\|
		Name == "sse2.sqrt.sd")) {
		Value *Vec = CI->getArgOperand(0);
		Value *Elt0 = Builder.CreateExtractElement(Vec, (uint64_t)0);
		Function *Intr = Intrinsic::getDeclaration(F->getParent(),
		Intrinsic::sqrt, Elt0->getType());
		Elt0 = Builder.CreateCall(Intr, Elt0);
		Rep = Builder.CreateInsertElement(Vec, Elt0, (uint64_t)0);
		} else if (IsX86 && (Name.startswith("avx.sqrt.p") \|\|
		Name.startswith("sse2.sqrt.p") \|\|
		Name.startswith("sse.sqrt.p"))) {
		Rep = Builder.CreateCall(Intrinsic::getDeclaration(F->getParent(),
		Intrinsic::sqrt,
		CI->getType()),
		{CI->getArgOperand(0)});
		} else if (IsX86 && (Name.startswith("avx512.mask.sqrt.p") &&
		!Name.endswith("512"))) {
		Rep = Builder.CreateCall(Intrinsic::getDeclaration(F->getParent(),
		Intrinsic::sqrt,
		CI->getType()),
		{CI->getArgOperand(0)});
		Rep = EmitX86Select(Builder, CI->getArgOperand(2), Rep,
		CI->getArgOperand(1));
} else if (IsX86 && (Name.startswith("avx512.ptestm") \|\|		} else if (IsX86 && (Name.startswith("avx512.ptestm") \|\|
Name.startswith("avx512.ptestnm"))) {		Name.startswith("avx512.ptestnm"))) {
Value *Op0 = CI->getArgOperand(0);		Value *Op0 = CI->getArgOperand(0);
Value *Op1 = CI->getArgOperand(1);		Value *Op1 = CI->getArgOperand(1);
Value *Mask = CI->getArgOperand(2);		Value *Mask = CI->getArgOperand(2);
Rep = Builder.CreateAnd(Op0, Op1);		Rep = Builder.CreateAnd(Op0, Op1);
llvm::Type *Ty = Op0->getType();		llvm::Type *Ty = Op0->getType();
Value *Zero = llvm::Constant::getNullValue(Ty);		Value *Zero = llvm::Constant::getNullValue(Ty);
▲ Show 20 Lines • Show All 1,936 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,528 Lines • ▼ Show 20 Lines	defm PSZ : avx512_sqrt_packed_round<opc, !strconcat(OpcodeStr, "ps"),
sched.PS.ZMM, v16f32_info>,		sched.PS.ZMM, v16f32_info>,
EVEX_V512, PS, EVEX_CD8<32, CD8VF>;		EVEX_V512, PS, EVEX_CD8<32, CD8VF>;
defm PDZ : avx512_sqrt_packed_round<opc, !strconcat(OpcodeStr, "pd"),		defm PDZ : avx512_sqrt_packed_round<opc, !strconcat(OpcodeStr, "pd"),
sched.PD.ZMM, v8f64_info>,		sched.PD.ZMM, v8f64_info>,
EVEX_V512, VEX_W, PD, EVEX_CD8<64, CD8VF>;		EVEX_V512, VEX_W, PD, EVEX_CD8<64, CD8VF>;
}		}

multiclass avx512_sqrt_scalar<bits<8> opc, string OpcodeStr, X86FoldableSchedWrite sched,		multiclass avx512_sqrt_scalar<bits<8> opc, string OpcodeStr, X86FoldableSchedWrite sched,
X86VectorVTInfo _, string Name, Intrinsic Intr> {		X86VectorVTInfo _, string Name> {
let ExeDomain = _.ExeDomain in {		let ExeDomain = _.ExeDomain in {
defm r_Int : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm r_Int : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
(ins _.RC:$src1, _.RC:$src2), OpcodeStr,		(ins _.RC:$src1, _.RC:$src2), OpcodeStr,
"$src2, $src1", "$src1, $src2",		"$src2, $src1", "$src1, $src2",
(X86fsqrtRnds (_.VT _.RC:$src1),		(X86fsqrtRnds (_.VT _.RC:$src1),
(_.VT _.RC:$src2),		(_.VT _.RC:$src2),
(i32 FROUND_CURRENT))>,		(i32 FROUND_CURRENT))>,
Sched<[sched]>;		Sched<[sched]>;
Show All 24 Lines	let isCodeGenOnly = 1, hasSideEffects = 0, Predicates=[HasAVX512] in {
Sched<[sched.Folded, ReadAfterLd]>;		Sched<[sched.Folded, ReadAfterLd]>;
}		}
}		}

let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
def : Pat<(_.EltVT (fsqrt _.FRC:$src)),		def : Pat<(_.EltVT (fsqrt _.FRC:$src)),
(!cast<Instruction>(Name#Zr)		(!cast<Instruction>(Name#Zr)
(_.EltVT (IMPLICIT_DEF)), _.FRC:$src)>;		(_.EltVT (IMPLICIT_DEF)), _.FRC:$src)>;

def : Pat<(Intr VR128X:$src),
(!cast<Instruction>(Name#Zr_Int) VR128X:$src,
VR128X:$src)>;
}		}

let Predicates = [HasAVX512, OptForSize] in {		let Predicates = [HasAVX512, OptForSize] in {
def : Pat<(_.EltVT (fsqrt (load addr:$src))),		def : Pat<(_.EltVT (fsqrt (load addr:$src))),
(!cast<Instruction>(Name#Zm)		(!cast<Instruction>(Name#Zm)
(_.EltVT (IMPLICIT_DEF)), addr:$src)>;		(_.EltVT (IMPLICIT_DEF)), addr:$src)>;

def : Pat<(Intr _.ScalarIntMemCPat:$src2),
(!cast<Instruction>(Name#Zm_Int)
(_.VT (IMPLICIT_DEF)), addr:$src2)>;
}		}
}		}

multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr,		multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr,
X86SchedWriteSizes sched> {		X86SchedWriteSizes sched> {
defm SSZ : avx512_sqrt_scalar<opc, OpcodeStr#"ss", sched.PS.Scl, f32x_info, NAME#"SS",		defm SSZ : avx512_sqrt_scalar<opc, OpcodeStr#"ss", sched.PS.Scl, f32x_info, NAME#"SS">,
int_x86_sse_sqrt_ss>,
EVEX_CD8<32, CD8VT1>, EVEX_4V, XS;		EVEX_CD8<32, CD8VT1>, EVEX_4V, XS;
defm SDZ : avx512_sqrt_scalar<opc, OpcodeStr#"sd", sched.PD.Scl, f64x_info, NAME#"SD",		defm SDZ : avx512_sqrt_scalar<opc, OpcodeStr#"sd", sched.PD.Scl, f64x_info, NAME#"SD">,
int_x86_sse2_sqrt_sd>,
EVEX_CD8<64, CD8VT1>, EVEX_4V, XD, VEX_W;		EVEX_CD8<64, CD8VT1>, EVEX_4V, XD, VEX_W;
}		}

defm VSQRT : avx512_sqrt_packed_all<0x51, "vsqrt", SchedWriteFSqrtSizes>,		defm VSQRT : avx512_sqrt_packed_all<0x51, "vsqrt", SchedWriteFSqrtSizes>,
avx512_sqrt_packed_all_round<0x51, "vsqrt", SchedWriteFSqrtSizes>;		avx512_sqrt_packed_all_round<0x51, "vsqrt", SchedWriteFSqrtSizes>;

defm VSQRT : avx512_sqrt_scalar_all<0x51, "vsqrt", SchedWriteFSqrtSizes>, VEX_LIG;		defm VSQRT : avx512_sqrt_scalar_all<0x51, "vsqrt", SchedWriteFSqrtSizes>, VEX_LIG;

▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	let Predicates = [BasePredicate] in {
def : Pat<(Move _.VT:$src1, (scalar_to_vector (X86selects Mask,		def : Pat<(Move _.VT:$src1, (scalar_to_vector (X86selects Mask,
(OpNode (extractelt _.VT:$src2, (iPTR 0))),		(OpNode (extractelt _.VT:$src2, (iPTR 0))),
ZeroFP))),		ZeroFP))),
(!cast<Instruction>("V"#OpcPrefix#r_Intkz)		(!cast<Instruction>("V"#OpcPrefix#r_Intkz)
OutMask, _.VT:$src2, _.VT:$src1)>;		OutMask, _.VT:$src2, _.VT:$src1)>;
}		}
}		}

		defm : avx512_masked_scalar<fsqrt, "SQRTSSZ", X86Movss,
		(v1i1 (scalar_to_vector (i8 (trunc (i32 GR32:$mask))))), v4f32x_info,
		fp32imm0, (COPY_TO_REGCLASS $mask, VK1WM), HasAVX512>;
		defm : avx512_masked_scalar<fsqrt, "SQRTSDZ", X86Movsd,
		(v1i1 (scalar_to_vector (i8 (trunc (i32 GR32:$mask))))), v2f64x_info,
		fp64imm0, (COPY_TO_REGCLASS $mask, VK1WM), HasAVX512>;

multiclass avx512_masked_scalar_imm<SDNode OpNode, string OpcPrefix, SDNode Move,		multiclass avx512_masked_scalar_imm<SDNode OpNode, string OpcPrefix, SDNode Move,
dag Mask, X86VectorVTInfo _, PatLeaf ZeroFP,		dag Mask, X86VectorVTInfo _, PatLeaf ZeroFP,
bits<8> ImmV, dag OutMask,		bits<8> ImmV, dag OutMask,
Predicate BasePredicate> {		Predicate BasePredicate> {
let Predicates = [BasePredicate] in {		let Predicates = [BasePredicate] in {
def : Pat<(Move _.VT:$src1, (scalar_to_vector (X86selects Mask,		def : Pat<(Move _.VT:$src1, (scalar_to_vector (X86selects Mask,
(OpNode (extractelt _.VT:$src2, (iPTR 0))),		(OpNode (extractelt _.VT:$src2, (iPTR 0))),
(extractelt _.VT:$dst, (iPTR 0))))),		(extractelt _.VT:$dst, (iPTR 0))))),
▲ Show 20 Lines • Show All 2,829 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,755 Lines • ▼ Show 20 Lines
/// scalar) and leaves the top elements undefined.		/// scalar) and leaves the top elements undefined.
///		///
/// And, we have a special variant form for a full-vector intrinsic form.		/// And, we have a special variant form for a full-vector intrinsic form.

/// sse_fp_unop_s - SSE1 unops in scalar form		/// sse_fp_unop_s - SSE1 unops in scalar form
/// For the non-AVX defs, we need $src1 to be tied to $dst because		/// For the non-AVX defs, we need $src1 to be tied to $dst because
/// the HW instructions are 2 operand / destructive.		/// the HW instructions are 2 operand / destructive.
multiclass sse_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,		multiclass sse_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,
ValueType vt, ValueType ScalarVT,		ValueType ScalarVT, X86MemOperand x86memop,
X86MemOperand x86memop,		Operand intmemop, SDNode OpNode, Domain d,
Operand intmemop, ComplexPattern int_cpat,		X86FoldableSchedWrite sched, Predicate target> {
Intrinsic Intr, SDNode OpNode, Domain d,
X86FoldableSchedWrite sched,
Predicate target> {
let hasSideEffects = 0 in {		let hasSideEffects = 0 in {
def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1),		def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1),
!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),		!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),
[(set RC:$dst, (OpNode RC:$src1))], d>, Sched<[sched]>,		[(set RC:$dst, (OpNode RC:$src1))], d>, Sched<[sched]>,
Requires<[target]>;		Requires<[target]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins x86memop:$src1),		def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins x86memop:$src1),
!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),		!strconcat(OpcodeStr, "\t{$src1, $dst\|$dst, $src1}"),
[(set RC:$dst, (OpNode (load addr:$src1)))], d>,		[(set RC:$dst, (OpNode (load addr:$src1)))], d>,
Sched<[sched.Folded, ReadAfterLd]>,		Sched<[sched.Folded, ReadAfterLd]>,
Requires<[target, OptForSize]>;		Requires<[target, OptForSize]>;

let isCodeGenOnly = 1, Constraints = "$src1 = $dst", ExeDomain = d in {		let isCodeGenOnly = 1, Constraints = "$src1 = $dst", ExeDomain = d in {
def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2),		def r_Int : I<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,
Sched<[sched]>;		Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1, intmemop:$src2),		def m_Int : I<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1, intmemop:$src2),
!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,		!strconcat(OpcodeStr, "\t{$src2, $dst\|$dst, $src2}"), []>,
Sched<[sched.Folded, ReadAfterLd]>;		Sched<[sched.Folded, ReadAfterLd]>;
}		}
}		}

		}

		multiclass sse_fp_unop_s_intr<RegisterClass RC, ValueType vt,
		ComplexPattern int_cpat, Intrinsic Intr,
		Predicate target, string Suffix> {
let Predicates = [target] in {		let Predicates = [target] in {
// These are unary operations, but they are modeled as having 2 source operands		// These are unary operations, but they are modeled as having 2 source operands
// because the high elements of the destination are unchanged in SSE.		// because the high elements of the destination are unchanged in SSE.
def : Pat<(Intr VR128:$src),		def : Pat<(Intr VR128:$src),
(!cast<Instruction>(NAME#r_Int) VR128:$src, VR128:$src)>;		(!cast<Instruction>(NAME#r_Int) VR128:$src, VR128:$src)>;
}		}
// We don't want to fold scalar loads into these instructions unless		// We don't want to fold scalar loads into these instructions unless
// optimizing for size. This is because the folded instruction will have a		// optimizing for size. This is because the folded instruction will have a
// partial register update, while the unfolded sequence will not, e.g.		// partial register update, while the unfolded sequence will not, e.g.
// movss mem, %xmm0		// movss mem, %xmm0
// rcpss %xmm0, %xmm0		// rcpss %xmm0, %xmm0
// which has a clobber before the rcp, vs.		// which has a clobber before the rcp, vs.
// rcpss mem, %xmm0		// rcpss mem, %xmm0
let Predicates = [target, OptForSize] in {		let Predicates = [target, OptForSize] in {
def : Pat<(Intr int_cpat:$src2),		def : Pat<(Intr int_cpat:$src2),
(!cast<Instruction>(NAME#m_Int)		(!cast<Instruction>(NAME#m_Int)
(vt (IMPLICIT_DEF)), addr:$src2)>;		(vt (IMPLICIT_DEF)), addr:$src2)>;
}		}
}		}

		multiclass avx_fp_unop_s_intr<RegisterClass RC, ValueType vt, ComplexPattern int_cpat,
		Intrinsic Intr, Predicate target> {
		let Predicates = [target] in {
		def : Pat<(Intr VR128:$src),
		(!cast<Instruction>(NAME#r_Int) VR128:$src,
		VR128:$src)>;
		}
		let Predicates = [target, OptForSize] in {
		def : Pat<(Intr int_cpat:$src2),
		(!cast<Instruction>(NAME#m_Int)
		(vt (IMPLICIT_DEF)), addr:$src2)>;
		}
		}

multiclass avx_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,		multiclass avx_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,
ValueType vt, ValueType ScalarVT,		ValueType ScalarVT, X86MemOperand x86memop,
X86MemOperand x86memop,		Operand intmemop, SDNode OpNode, Domain d,
Operand intmemop, ComplexPattern int_cpat,
Intrinsic Intr, SDNode OpNode, Domain d,
X86FoldableSchedWrite sched, Predicate target> {		X86FoldableSchedWrite sched, Predicate target> {
let hasSideEffects = 0 in {		let hasSideEffects = 0 in {
def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),		def r : I<opc, MRMSrcReg, (outs RC:$dst), (ins RC:$src1, RC:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
[], d>, Sched<[sched]>;		[], d>, Sched<[sched]>;
let mayLoad = 1 in		let mayLoad = 1 in
def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),		def m : I<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),		!strconcat(OpcodeStr, "\t{$src2, $src1, $dst\|$dst, $src1, $src2}"),
Show All 18 Lines	multiclass avx_fp_unop_s<bits<8> opc, string OpcodeStr, RegisterClass RC,
// vrcpss %xmm0, %xmm0, %xmm0		// vrcpss %xmm0, %xmm0, %xmm0
// which has a clobber before the rcp, vs.		// which has a clobber before the rcp, vs.
// vrcpss mem, %xmm0, %xmm0		// vrcpss mem, %xmm0, %xmm0
// TODO: In theory, we could fold the load, and avoid the stall caused by		// TODO: In theory, we could fold the load, and avoid the stall caused by
// the partial register store, either in BreakFalseDeps or with smarter RA.		// the partial register store, either in BreakFalseDeps or with smarter RA.
let Predicates = [target] in {		let Predicates = [target] in {
def : Pat<(OpNode RC:$src), (!cast<Instruction>(NAME#r)		def : Pat<(OpNode RC:$src), (!cast<Instruction>(NAME#r)
(ScalarVT (IMPLICIT_DEF)), RC:$src)>;		(ScalarVT (IMPLICIT_DEF)), RC:$src)>;
def : Pat<(Intr VR128:$src),
(!cast<Instruction>(NAME#r_Int) VR128:$src,
VR128:$src)>;
}		}
let Predicates = [target, OptForSize] in {		let Predicates = [target, OptForSize] in {
def : Pat<(Intr int_cpat:$src2),
(!cast<Instruction>(NAME#m_Int)
(vt (IMPLICIT_DEF)), addr:$src2)>;
def : Pat<(ScalarVT (OpNode (load addr:$src))),		def : Pat<(ScalarVT (OpNode (load addr:$src))),
(!cast<Instruction>(NAME#m) (ScalarVT (IMPLICIT_DEF)),		(!cast<Instruction>(NAME#m) (ScalarVT (IMPLICIT_DEF)),
addr:$src)>;		addr:$src)>;
}		}
}		}

/// sse1_fp_unop_p - SSE1 unops in packed form.		/// sse1_fp_unop_p - SSE1 unops in packed form.
multiclass sse1_fp_unop_p<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass sse1_fp_unop_p<bits<8> opc, string OpcodeStr, SDNode OpNode,
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	def PDr : PDI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
[(set VR128:$dst, (v2f64 (OpNode VR128:$src)))]>,		[(set VR128:$dst, (v2f64 (OpNode VR128:$src)))]>,
Sched<[sched.XMM]>;		Sched<[sched.XMM]>;
def PDm : PDI<opc, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),		def PDm : PDI<opc, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
!strconcat(OpcodeStr, "pd\t{$src, $dst\|$dst, $src}"),		!strconcat(OpcodeStr, "pd\t{$src, $dst\|$dst, $src}"),
[(set VR128:$dst, (OpNode (memopv2f64 addr:$src)))]>,		[(set VR128:$dst, (OpNode (memopv2f64 addr:$src)))]>,
Sched<[sched.XMM.Folded]>;		Sched<[sched.XMM.Folded]>;
}		}

		multiclass sse1_fp_unop_s_intr<bits<8> opc, string OpcodeStr, SDNode OpNode,
		X86SchedWriteWidths sched, Predicate AVXTarget> {
		defm SS : sse_fp_unop_s_intr<FR32, v4f32, sse_load_f32,
		!cast<Intrinsic>("int_x86_sse_"##OpcodeStr##_ss),
		UseSSE1, "SS">, XS;
		defm V#NAME#SS : avx_fp_unop_s_intr<FR32, v4f32, sse_load_f32,
		!cast<Intrinsic>("int_x86_sse_"##OpcodeStr##_ss),
		AVXTarget>,
		XS, VEX_4V, VEX_LIG, VEX_WIG, NotMemoryFoldable;
		}

multiclass sse1_fp_unop_s<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass sse1_fp_unop_s<bits<8> opc, string OpcodeStr, SDNode OpNode,
X86SchedWriteWidths sched, Predicate AVXTarget> {		X86SchedWriteWidths sched, Predicate AVXTarget> {
defm SS : sse_fp_unop_s<opc, OpcodeStr##ss, FR32, v4f32, f32, f32mem,		defm SS : sse_fp_unop_s<opc, OpcodeStr##ss, FR32, f32, f32mem,
ssmem, sse_load_f32,		ssmem, OpNode, SSEPackedSingle, sched.Scl, UseSSE1>, XS;
!cast<Intrinsic>("int_x86_sse_"##OpcodeStr##_ss), OpNode,		defm V#NAME#SS : avx_fp_unop_s<opc, "v"#OpcodeStr##ss, FR32, f32,
SSEPackedSingle, sched.Scl, UseSSE1>, XS;		f32mem, ssmem, OpNode, SSEPackedSingle, sched.Scl, AVXTarget>,
defm V#NAME#SS : avx_fp_unop_s<opc, "v"#OpcodeStr##ss, FR32, v4f32, f32,		XS, VEX_4V, VEX_LIG, VEX_WIG;
f32mem, ssmem, sse_load_f32,
!cast<Intrinsic>("int_x86_sse_"##OpcodeStr##_ss), OpNode,
SSEPackedSingle, sched.Scl, AVXTarget>, XS, VEX_4V,
VEX_LIG, VEX_WIG;
}		}

multiclass sse2_fp_unop_s<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass sse2_fp_unop_s<bits<8> opc, string OpcodeStr, SDNode OpNode,
X86SchedWriteWidths sched, Predicate AVXTarget> {		X86SchedWriteWidths sched, Predicate AVXTarget> {
defm SD : sse_fp_unop_s<opc, OpcodeStr##sd, FR64, v2f64, f64, f64mem,		defm SD : sse_fp_unop_s<opc, OpcodeStr##sd, FR64, f64, f64mem,
sdmem, sse_load_f64,		sdmem, OpNode, SSEPackedDouble, sched.Scl, UseSSE2>, XD;
!cast<Intrinsic>("int_x86_sse2_"##OpcodeStr##_sd),		defm V#NAME#SD : avx_fp_unop_s<opc, "v"#OpcodeStr##sd, FR64, f64,
OpNode, SSEPackedDouble, sched.Scl, UseSSE2>, XD;		f64mem, sdmem, OpNode, SSEPackedDouble, sched.Scl, AVXTarget>,
defm V#NAME#SD : avx_fp_unop_s<opc, "v"#OpcodeStr##sd, FR64, v2f64, f64,
f64mem, sdmem, sse_load_f64,
!cast<Intrinsic>("int_x86_sse2_"##OpcodeStr##_sd),
OpNode, SSEPackedDouble, sched.Scl, AVXTarget>,
XD, VEX_4V, VEX_LIG, VEX_WIG;		XD, VEX_4V, VEX_LIG, VEX_WIG;
}		}

// Square root.		// Square root.
defm SQRT : sse1_fp_unop_s<0x51, "sqrt", fsqrt, SchedWriteFSqrt, UseAVX>,		defm SQRT : sse1_fp_unop_s<0x51, "sqrt", fsqrt, SchedWriteFSqrt, UseAVX>,
sse1_fp_unop_p<0x51, "sqrt", fsqrt, SchedWriteFSqrt, [HasAVX, NoVLX]>,		sse1_fp_unop_p<0x51, "sqrt", fsqrt, SchedWriteFSqrt, [HasAVX, NoVLX]>,
sse2_fp_unop_s<0x51, "sqrt", fsqrt, SchedWriteFSqrt64, UseAVX>,		sse2_fp_unop_s<0x51, "sqrt", fsqrt, SchedWriteFSqrt64, UseAVX>,
sse2_fp_unop_p<0x51, "sqrt", fsqrt, SchedWriteFSqrt64>;		sse2_fp_unop_p<0x51, "sqrt", fsqrt, SchedWriteFSqrt64>;

// Reciprocal approximations. Note that these typically require refinement		// Reciprocal approximations. Note that these typically require refinement
// in order to obtain suitable precision.		// in order to obtain suitable precision.
defm RSQRT : sse1_fp_unop_s<0x52, "rsqrt", X86frsqrt, SchedWriteFRsqrt, HasAVX>,		defm RSQRT : sse1_fp_unop_s<0x52, "rsqrt", X86frsqrt, SchedWriteFRsqrt, HasAVX>,
		sse1_fp_unop_s_intr<0x52, "rsqrt", X86frsqrt, SchedWriteFRsqrt, HasAVX>,
sse1_fp_unop_p<0x52, "rsqrt", X86frsqrt, SchedWriteFRsqrt, [HasAVX]>;		sse1_fp_unop_p<0x52, "rsqrt", X86frsqrt, SchedWriteFRsqrt, [HasAVX]>;
defm RCP : sse1_fp_unop_s<0x53, "rcp", X86frcp, SchedWriteFRcp, HasAVX>,		defm RCP : sse1_fp_unop_s<0x53, "rcp", X86frcp, SchedWriteFRcp, HasAVX>,
		sse1_fp_unop_s_intr<0x53, "rcp", X86frcp, SchedWriteFRcp, HasAVX>,
sse1_fp_unop_p<0x53, "rcp", X86frcp, SchedWriteFRcp, [HasAVX]>;		sse1_fp_unop_p<0x53, "rcp", X86frcp, SchedWriteFRcp, [HasAVX]>;

// There is no f64 version of the reciprocal approximation instructions.		// There is no f64 version of the reciprocal approximation instructions.

multiclass scalar_unary_math_patterns<SDNode OpNode, string OpcPrefix, SDNode Move,		multiclass scalar_unary_math_patterns<SDNode OpNode, string OpcPrefix, SDNode Move,
ValueType VT, Predicate BasePredicate> {		ValueType VT, Predicate BasePredicate> {
let Predicates = [BasePredicate] in {		let Predicates = [BasePredicate] in {
def : Pat<(VT (Move VT:$dst, (scalar_to_vector		def : Pat<(VT (Move VT:$dst, (scalar_to_vector
Show All 21 Lines	multiclass scalar_unary_math_imm_patterns<SDNode OpNode, string OpcPrefix, SDNode Move,
// Repeat for AVX versions of the instructions.		// Repeat for AVX versions of the instructions.
let Predicates = [HasAVX] in {		let Predicates = [HasAVX] in {
def : Pat<(VT (Move VT:$dst, (scalar_to_vector		def : Pat<(VT (Move VT:$dst, (scalar_to_vector
(OpNode (extractelt VT:$src, 0))))),		(OpNode (extractelt VT:$src, 0))))),
(!cast<Ii8>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src, (i32 ImmV))>;		(!cast<Ii8>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src, (i32 ImmV))>;
}		}
}		}

		defm : scalar_unary_math_patterns<fsqrt, "SQRTSS", X86Movss, v4f32, UseSSE1>;
		defm : scalar_unary_math_patterns<fsqrt, "SQRTSD", X86Movsd, v2f64, UseSSE2>;

multiclass scalar_unary_math_intr_patterns<Intrinsic Intr, string OpcPrefix,		multiclass scalar_unary_math_intr_patterns<Intrinsic Intr, string OpcPrefix,
SDNode Move, ValueType VT,		SDNode Move, ValueType VT,
Predicate BasePredicate> {		Predicate BasePredicate> {
let Predicates = [BasePredicate] in {		let Predicates = [BasePredicate] in {
def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),		def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),
(!cast<I>(OpcPrefix#r_Int) VT:$dst, VT:$src)>;		(!cast<I>(OpcPrefix#r_Int) VT:$dst, VT:$src)>;
}		}

// Repeat for AVX versions of the instructions.		// Repeat for AVX versions of the instructions.
let Predicates = [HasAVX] in {		let Predicates = [HasAVX] in {
def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),		def : Pat<(VT (Move VT:$dst, (Intr VT:$src))),
(!cast<I>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src)>;		(!cast<I>("V"#OpcPrefix#r_Int) VT:$dst, VT:$src)>;
}		}
}		}

defm : scalar_unary_math_intr_patterns<int_x86_sse_rcp_ss, "RCPSS", X86Movss,		defm : scalar_unary_math_intr_patterns<int_x86_sse_rcp_ss, "RCPSS", X86Movss,
v4f32, UseSSE1>;		v4f32, UseSSE1>;
defm : scalar_unary_math_intr_patterns<int_x86_sse_rsqrt_ss, "RSQRTSS", X86Movss,		defm : scalar_unary_math_intr_patterns<int_x86_sse_rsqrt_ss, "RSQRTSS", X86Movss,
v4f32, UseSSE1>;		v4f32, UseSSE1>;
defm : scalar_unary_math_intr_patterns<int_x86_sse_sqrt_ss, "SQRTSS", X86Movss,
v4f32, UseSSE1>;
defm : scalar_unary_math_intr_patterns<int_x86_sse2_sqrt_sd, "SQRTSD", X86Movsd,
v2f64, UseSSE2>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SSE 1 & 2 - Non-temporal stores		// SSE 1 & 2 - Non-temporal stores
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let AddedComplexity = 400 in { // Prefer non-temporal versions		let AddedComplexity = 400 in { // Prefer non-temporal versions
let Predicates = [HasAVX, NoVLX] in {		let Predicates = [HasAVX, NoVLX] in {
▲ Show 20 Lines • Show All 5,235 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86IntrinsicsInfo.h

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx_min_pd_256, INTR_TYPE_2OP, X86ISD::FMIN, 0),		X86_INTRINSIC_DATA(avx_min_pd_256, INTR_TYPE_2OP, X86ISD::FMIN, 0),
X86_INTRINSIC_DATA(avx_min_ps_256, INTR_TYPE_2OP, X86ISD::FMIN, 0),		X86_INTRINSIC_DATA(avx_min_ps_256, INTR_TYPE_2OP, X86ISD::FMIN, 0),
X86_INTRINSIC_DATA(avx_movmsk_pd_256, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),		X86_INTRINSIC_DATA(avx_movmsk_pd_256, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),
X86_INTRINSIC_DATA(avx_movmsk_ps_256, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),		X86_INTRINSIC_DATA(avx_movmsk_ps_256, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),
X86_INTRINSIC_DATA(avx_rcp_ps_256, INTR_TYPE_1OP, X86ISD::FRCP, 0),		X86_INTRINSIC_DATA(avx_rcp_ps_256, INTR_TYPE_1OP, X86ISD::FRCP, 0),
X86_INTRINSIC_DATA(avx_round_pd_256, ROUNDP, X86ISD::VRNDSCALE, 0),		X86_INTRINSIC_DATA(avx_round_pd_256, ROUNDP, X86ISD::VRNDSCALE, 0),
X86_INTRINSIC_DATA(avx_round_ps_256, ROUNDP, X86ISD::VRNDSCALE, 0),		X86_INTRINSIC_DATA(avx_round_ps_256, ROUNDP, X86ISD::VRNDSCALE, 0),
X86_INTRINSIC_DATA(avx_rsqrt_ps_256, INTR_TYPE_1OP, X86ISD::FRSQRT, 0),		X86_INTRINSIC_DATA(avx_rsqrt_ps_256, INTR_TYPE_1OP, X86ISD::FRSQRT, 0),
X86_INTRINSIC_DATA(avx_sqrt_pd_256, INTR_TYPE_1OP, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx_sqrt_ps_256, INTR_TYPE_1OP, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx_vpermilvar_pd, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_pd, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx_vpermilvar_pd_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_pd_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx_vpermilvar_ps, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_ps, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx_vpermilvar_ps_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_ps_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx2_packssdw, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(avx2_packssdw, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(avx2_packsswb, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(avx2_packsswb, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(avx2_packusdw, INTR_TYPE_2OP, X86ISD::PACKUS, 0),		X86_INTRINSIC_DATA(avx2_packusdw, INTR_TYPE_2OP, X86ISD::PACKUS, 0),
X86_INTRINSIC_DATA(avx2_packuswb, INTR_TYPE_2OP, X86ISD::PACKUS, 0),		X86_INTRINSIC_DATA(avx2_packuswb, INTR_TYPE_2OP, X86ISD::PACKUS, 0),
▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx512_mask_scalef_ps_256, INTR_TYPE_2OP_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_scalef_ps_256, INTR_TYPE_2OP_MASK_RM,
X86ISD::SCALEF, 0),		X86ISD::SCALEF, 0),
X86_INTRINSIC_DATA(avx512_mask_scalef_ps_512, INTR_TYPE_2OP_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_scalef_ps_512, INTR_TYPE_2OP_MASK_RM,
X86ISD::SCALEF, 0),		X86ISD::SCALEF, 0),
X86_INTRINSIC_DATA(avx512_mask_scalef_sd, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_scalef_sd, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::SCALEFS, 0),		X86ISD::SCALEFS, 0),
X86_INTRINSIC_DATA(avx512_mask_scalef_ss, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_scalef_ss, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::SCALEFS, 0),		X86ISD::SCALEFS, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_pd_128, INTR_TYPE_1OP_MASK, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_pd_256, INTR_TYPE_1OP_MASK, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_pd_512, INTR_TYPE_1OP_MASK, ISD::FSQRT,		X86_INTRINSIC_DATA(avx512_mask_sqrt_pd_512, INTR_TYPE_1OP_MASK, ISD::FSQRT,
X86ISD::FSQRT_RND),		X86ISD::FSQRT_RND),
X86_INTRINSIC_DATA(avx512_mask_sqrt_ps_128, INTR_TYPE_1OP_MASK, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_ps_256, INTR_TYPE_1OP_MASK, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_ps_512, INTR_TYPE_1OP_MASK, ISD::FSQRT,		X86_INTRINSIC_DATA(avx512_mask_sqrt_ps_512, INTR_TYPE_1OP_MASK, ISD::FSQRT,
X86ISD::FSQRT_RND),		X86ISD::FSQRT_RND),
X86_INTRINSIC_DATA(avx512_mask_sqrt_sd, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_sqrt_sd, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::FSQRTS_RND, 0),		X86ISD::FSQRTS_RND, 0),
X86_INTRINSIC_DATA(avx512_mask_sqrt_ss, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_sqrt_ss, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::FSQRTS_RND, 0),		X86ISD::FSQRTS_RND, 0),
X86_INTRINSIC_DATA(avx512_mask_sub_sd_round, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_sub_sd_round, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::FSUBS_RND, 0),		X86ISD::FSUBS_RND, 0),
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(sse_comineq_ss, COMI, X86ISD::COMI, ISD::SETNE),		X86_INTRINSIC_DATA(sse_comineq_ss, COMI, X86ISD::COMI, ISD::SETNE),
X86_INTRINSIC_DATA(sse_max_ps, INTR_TYPE_2OP, X86ISD::FMAX, 0),		X86_INTRINSIC_DATA(sse_max_ps, INTR_TYPE_2OP, X86ISD::FMAX, 0),
X86_INTRINSIC_DATA(sse_max_ss, INTR_TYPE_2OP, X86ISD::FMAXS, 0),		X86_INTRINSIC_DATA(sse_max_ss, INTR_TYPE_2OP, X86ISD::FMAXS, 0),
X86_INTRINSIC_DATA(sse_min_ps, INTR_TYPE_2OP, X86ISD::FMIN, 0),		X86_INTRINSIC_DATA(sse_min_ps, INTR_TYPE_2OP, X86ISD::FMIN, 0),
X86_INTRINSIC_DATA(sse_min_ss, INTR_TYPE_2OP, X86ISD::FMINS, 0),		X86_INTRINSIC_DATA(sse_min_ss, INTR_TYPE_2OP, X86ISD::FMINS, 0),
X86_INTRINSIC_DATA(sse_movmsk_ps, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),		X86_INTRINSIC_DATA(sse_movmsk_ps, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),
X86_INTRINSIC_DATA(sse_rcp_ps, INTR_TYPE_1OP, X86ISD::FRCP, 0),		X86_INTRINSIC_DATA(sse_rcp_ps, INTR_TYPE_1OP, X86ISD::FRCP, 0),
X86_INTRINSIC_DATA(sse_rsqrt_ps, INTR_TYPE_1OP, X86ISD::FRSQRT, 0),		X86_INTRINSIC_DATA(sse_rsqrt_ps, INTR_TYPE_1OP, X86ISD::FRSQRT, 0),
X86_INTRINSIC_DATA(sse_sqrt_ps, INTR_TYPE_1OP, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(sse_ucomieq_ss, COMI, X86ISD::UCOMI, ISD::SETEQ),		X86_INTRINSIC_DATA(sse_ucomieq_ss, COMI, X86ISD::UCOMI, ISD::SETEQ),
X86_INTRINSIC_DATA(sse_ucomige_ss, COMI, X86ISD::UCOMI, ISD::SETGE),		X86_INTRINSIC_DATA(sse_ucomige_ss, COMI, X86ISD::UCOMI, ISD::SETGE),
X86_INTRINSIC_DATA(sse_ucomigt_ss, COMI, X86ISD::UCOMI, ISD::SETGT),		X86_INTRINSIC_DATA(sse_ucomigt_ss, COMI, X86ISD::UCOMI, ISD::SETGT),
X86_INTRINSIC_DATA(sse_ucomile_ss, COMI, X86ISD::UCOMI, ISD::SETLE),		X86_INTRINSIC_DATA(sse_ucomile_ss, COMI, X86ISD::UCOMI, ISD::SETLE),
X86_INTRINSIC_DATA(sse_ucomilt_ss, COMI, X86ISD::UCOMI, ISD::SETLT),		X86_INTRINSIC_DATA(sse_ucomilt_ss, COMI, X86ISD::UCOMI, ISD::SETLT),
X86_INTRINSIC_DATA(sse_ucomineq_ss, COMI, X86ISD::UCOMI, ISD::SETNE),		X86_INTRINSIC_DATA(sse_ucomineq_ss, COMI, X86ISD::UCOMI, ISD::SETNE),
X86_INTRINSIC_DATA(sse2_cmp_pd, INTR_TYPE_3OP, X86ISD::CMPP, 0),		X86_INTRINSIC_DATA(sse2_cmp_pd, INTR_TYPE_3OP, X86ISD::CMPP, 0),
X86_INTRINSIC_DATA(sse2_comieq_sd, COMI, X86ISD::COMI, ISD::SETEQ),		X86_INTRINSIC_DATA(sse2_comieq_sd, COMI, X86ISD::COMI, ISD::SETEQ),
Show All 39 Lines
X86_INTRINSIC_DATA(sse2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(sse2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(sse2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(sse2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(sse2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(sse2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(sse2_psubus_b, INTR_TYPE_2OP, X86ISD::SUBUS, 0),		X86_INTRINSIC_DATA(sse2_psubus_b, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(sse2_psubus_w, INTR_TYPE_2OP, X86ISD::SUBUS, 0),		X86_INTRINSIC_DATA(sse2_psubus_w, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(sse2_sqrt_pd, INTR_TYPE_1OP, ISD::FSQRT, 0),
X86_INTRINSIC_DATA(sse2_ucomieq_sd, COMI, X86ISD::UCOMI, ISD::SETEQ),		X86_INTRINSIC_DATA(sse2_ucomieq_sd, COMI, X86ISD::UCOMI, ISD::SETEQ),
X86_INTRINSIC_DATA(sse2_ucomige_sd, COMI, X86ISD::UCOMI, ISD::SETGE),		X86_INTRINSIC_DATA(sse2_ucomige_sd, COMI, X86ISD::UCOMI, ISD::SETGE),
X86_INTRINSIC_DATA(sse2_ucomigt_sd, COMI, X86ISD::UCOMI, ISD::SETGT),		X86_INTRINSIC_DATA(sse2_ucomigt_sd, COMI, X86ISD::UCOMI, ISD::SETGT),
X86_INTRINSIC_DATA(sse2_ucomile_sd, COMI, X86ISD::UCOMI, ISD::SETLE),		X86_INTRINSIC_DATA(sse2_ucomile_sd, COMI, X86ISD::UCOMI, ISD::SETLE),
X86_INTRINSIC_DATA(sse2_ucomilt_sd, COMI, X86ISD::UCOMI, ISD::SETLT),		X86_INTRINSIC_DATA(sse2_ucomilt_sd, COMI, X86ISD::UCOMI, ISD::SETLT),
X86_INTRINSIC_DATA(sse2_ucomineq_sd, COMI, X86ISD::UCOMI, ISD::SETNE),		X86_INTRINSIC_DATA(sse2_ucomineq_sd, COMI, X86ISD::UCOMI, ISD::SETNE),
X86_INTRINSIC_DATA(sse3_addsub_pd, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),		X86_INTRINSIC_DATA(sse3_addsub_pd, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),
X86_INTRINSIC_DATA(sse3_addsub_ps, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),		X86_INTRINSIC_DATA(sse3_addsub_ps, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

Show First 20 Lines • Show All 1,287 Lines • ▼ Show 20 Lines	case Intrinsic::x86_xop_vfrcz_sd:

// Only the lower element is undefined. The high elements are zero.		// Only the lower element is undefined. The high elements are zero.
UndefElts = UndefElts[0];		UndefElts = UndefElts[0];
break;		break;

// Unary scalar-as-vector operations that work column-wise.		// Unary scalar-as-vector operations that work column-wise.
case Intrinsic::x86_sse_rcp_ss:		case Intrinsic::x86_sse_rcp_ss:
case Intrinsic::x86_sse_rsqrt_ss:		case Intrinsic::x86_sse_rsqrt_ss:
case Intrinsic::x86_sse_sqrt_ss:
case Intrinsic::x86_sse2_sqrt_sd:
TmpV = SimplifyDemandedVectorElts(II->getArgOperand(0), DemandedElts,		TmpV = SimplifyDemandedVectorElts(II->getArgOperand(0), DemandedElts,
UndefElts, Depth + 1);		UndefElts, Depth + 1);
if (TmpV) { II->setArgOperand(0, TmpV); MadeChange = true; }		if (TmpV) { II->setArgOperand(0, TmpV); MadeChange = true; }

// If lowest element of a scalar op isn't used then use Arg0.		// If lowest element of a scalar op isn't used then use Arg0.
if (!DemandedElts[0]) {		if (!DemandedElts[0]) {
Worklist.Add(II);		Worklist.Add(II);
return II->getArgOperand(0);		return II->getArgOperand(0);
▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 3,012 Lines • ▼ Show 20 Lines
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vsqrtpd %ymm0, %ymm0			; X32-NEXT: vsqrtpd %ymm0, %ymm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm256_sqrt_pd:			; X64-LABEL: test_mm256_sqrt_pd:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vsqrtpd %ymm0, %ymm0			; X64-NEXT: vsqrtpd %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %a0)			entry:
	ret <4 x double> %res			%0 = tail call <4 x double> @llvm.sqrt.v4f64(<4 x double> %a0) #2
				ret <4 x double> %0
	}			}
	declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone
				declare <4 x double> @llvm.sqrt.v4f64(<4 x double>) #1

	define <8 x float> @test_mm256_sqrt_ps(<8 x float> %a0) nounwind {			define <8 x float> @test_mm256_sqrt_ps(<8 x float> %a0) nounwind {
	; X32-LABEL: test_mm256_sqrt_ps:			; X32-LABEL: test_mm256_sqrt_ps:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vsqrtps %ymm0, %ymm0			; X32-NEXT: vsqrtps %ymm0, %ymm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm256_sqrt_ps:			; X64-LABEL: test_mm256_sqrt_ps:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vsqrtps %ymm0, %ymm0			; X64-NEXT: vsqrtps %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %a0)			entry:
	ret <8 x float> %res			%0 = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %a0) #2
				ret <8 x float> %0
	}			}
	declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone
				declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #1

	define void @test_mm256_store_pd(double* %a0, <4 x double> %a1) nounwind {			define void @test_mm256_store_pd(double* %a0, <4 x double> %a1) nounwind {
	; X32-LABEL: test_mm256_store_pd:			; X32-LABEL: test_mm256_store_pd:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: vmovaps %ymm0, (%eax)			; X32-NEXT: vmovaps %ymm0, (%eax)
	; X32-NEXT: vzeroupper			; X32-NEXT: vzeroupper
	; X32-NEXT: retl			; X32-NEXT: retl
	▲ Show 20 Lines • Show All 771 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX512VL,X86-AVX512VL			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX512VL,X86-AVX512VL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX512VL,X64-AVX512VL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX512VL,X64-AVX512VL

	; We don't check any vinsertf128 variant with immediate 0 because that's just a blend.			; We don't check any vinsertf128 variant with immediate 0 because that's just a blend.

				define <4 x double> @test_x86_avx_sqrt_pd_256(<4 x double> %a0) {
				; AVX-LABEL: test_x86_avx_sqrt_pd_256:
				; AVX: # %bb.0:
				; AVX-NEXT: vsqrtpd %ymm0, %ymm0 # encoding: [0xc5,0xfd,0x51,0xc0]
				; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				;
				; AVX512VL-LABEL: test_x86_avx_sqrt_pd_256:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vsqrtpd %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x51,0xc0]
				; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %a0) ; <<4 x double>> [#uses=1]
				ret <4 x double> %res
				}
				declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone

				define <8 x float> @test_x86_avx_sqrt_ps_256(<8 x float> %a0) {
				; AVX-LABEL: test_x86_avx_sqrt_ps_256:
				; AVX: # %bb.0:
				; AVX-NEXT: vsqrtps %ymm0, %ymm0 # encoding: [0xc5,0xfc,0x51,0xc0]
				; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				;
				; AVX512VL-LABEL: test_x86_avx_sqrt_ps_256:
				; AVX512VL: # %bb.0:
				; AVX512VL-NEXT: vsqrtps %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x51,0xc0]
				; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %a0) ; <<8 x float>> [#uses=1]
				ret <8 x float> %res
				}
				declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone

	define <4 x double> @test_x86_avx_vinsertf128_pd_256_1(<4 x double> %a0, <2 x double> %a1) {			define <4 x double> @test_x86_avx_vinsertf128_pd_256_1(<4 x double> %a0, <2 x double> %a1) {
	; AVX-LABEL: test_x86_avx_vinsertf128_pd_256_1:			; AVX-LABEL: test_x86_avx_vinsertf128_pd_256_1:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x18,0xc1,0x01]			; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x18,0xc1,0x01]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;			;
	; AVX512VL-LABEL: test_x86_avx_vinsertf128_pd_256_1:			; AVX512VL-LABEL: test_x86_avx_vinsertf128_pd_256_1:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	▲ Show 20 Lines • Show All 944 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll

	Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vrsqrtps %ymm0, %ymm0 # encoding: [0xc5,0xfc,0x52,0xc0]			; CHECK-NEXT: vrsqrtps %ymm0, %ymm0 # encoding: [0xc5,0xfc,0x52,0xc0]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %a0) ; <<8 x float>> [#uses=1]			%res = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %a0) ; <<8 x float>> [#uses=1]
	ret <8 x float> %res			ret <8 x float> %res
	}			}
	declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone


	define <4 x double> @test_x86_avx_sqrt_pd_256(<4 x double> %a0) {
	; AVX-LABEL: test_x86_avx_sqrt_pd_256:
	; AVX: # %bb.0:
	; AVX-NEXT: vsqrtpd %ymm0, %ymm0 # encoding: [0xc5,0xfd,0x51,0xc0]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;
	; AVX512VL-LABEL: test_x86_avx_sqrt_pd_256:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vsqrtpd %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x51,0xc0]
	; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %a0) ; <<4 x double>> [#uses=1]
	ret <4 x double> %res
	}
	declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone


	define <8 x float> @test_x86_avx_sqrt_ps_256(<8 x float> %a0) {
	; AVX-LABEL: test_x86_avx_sqrt_ps_256:
	; AVX: # %bb.0:
	; AVX-NEXT: vsqrtps %ymm0, %ymm0 # encoding: [0xc5,0xfc,0x51,0xc0]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;
	; AVX512VL-LABEL: test_x86_avx_sqrt_ps_256:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vsqrtps %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x51,0xc0]
	; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %a0) ; <<8 x float>> [#uses=1]
	ret <8 x float> %res
	}
	declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone


	define <2 x double> @test_x86_avx_vpermilvar_pd(<2 x double> %a0, <2 x i64> %a1) {			define <2 x double> @test_x86_avx_vpermilvar_pd(<2 x double> %a0, <2 x i64> %a1) {
	; AVX-LABEL: test_x86_avx_vpermilvar_pd:			; AVX-LABEL: test_x86_avx_vpermilvar_pd:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpermilpd %xmm1, %xmm0, %xmm0 # encoding: [0xc4,0xe2,0x79,0x0d,0xc1]			; AVX-NEXT: vpermilpd %xmm1, %xmm0, %xmm0 # encoding: [0xc4,0xe2,0x79,0x0d,0xc1]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;			;
	; AVX512VL-LABEL: test_x86_avx_vpermilvar_pd:			; AVX512VL-LABEL: test_x86_avx_vpermilvar_pd:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,369 Lines • ▼ Show 20 Lines
	}			}


	declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>) #8			declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>) #8
	declare <4 x double> @llvm.fma.v4f64(<4 x double>, <4 x double>, <4 x double>) #8			declare <4 x double> @llvm.fma.v4f64(<4 x double>, <4 x double>, <4 x double>) #8
	declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) #8			declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) #8
	declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #8			declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #8

				define <2 x double> @test_mm_mask_sqrt_pd(<2 x double> %__W, i8 zeroext %__U, <2 x double> %__A) {
				; X32-LABEL: test_mm_mask_sqrt_pd:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtpd %xmm1, %xmm0 {%k1}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm_mask_sqrt_pd:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtpd %xmm1, %xmm0 {%k1}
				; X64-NEXT: retq
				entry:
				%0 = tail call <2 x double> @llvm.sqrt.v2f64(<2 x double> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%2 = select <2 x i1> %extract.i, <2 x double> %0, <2 x double> %__W
				ret <2 x double> %2
				}

				declare <2 x double> @llvm.sqrt.v2f64(<2 x double>)

				define <2 x double> @test_mm_maskz_sqrt_pd(i8 zeroext %__U, <2 x double> %__A) {
				; X32-LABEL: test_mm_maskz_sqrt_pd:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtpd %xmm0, %xmm0 {%k1} {z}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm_maskz_sqrt_pd:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtpd %xmm0, %xmm0 {%k1} {z}
				; X64-NEXT: retq
				entry:
				%0 = tail call <2 x double> @llvm.sqrt.v2f64(<2 x double> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%2 = select <2 x i1> %extract.i, <2 x double> %0, <2 x double> zeroinitializer
				ret <2 x double> %2
				}

				define <4 x double> @test_mm256_mask_sqrt_pd(<4 x double> %__W, i8 zeroext %__U, <4 x double> %__A) {
				; X32-LABEL: test_mm256_mask_sqrt_pd:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtpd %ymm1, %ymm0 {%k1}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm256_mask_sqrt_pd:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtpd %ymm1, %ymm0 {%k1}
				; X64-NEXT: retq
				entry:
				%0 = tail call <4 x double> @llvm.sqrt.v4f64(<4 x double> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%2 = select <4 x i1> %extract.i, <4 x double> %0, <4 x double> %__W
				ret <4 x double> %2
				}

				declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)

				define <4 x double> @test_mm256_maskz_sqrt_pd(i8 zeroext %__U, <4 x double> %__A) {
				; X32-LABEL: test_mm256_maskz_sqrt_pd:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm256_maskz_sqrt_pd:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z}
				; X64-NEXT: retq
				entry:
				%0 = tail call <4 x double> @llvm.sqrt.v4f64(<4 x double> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%2 = select <4 x i1> %extract.i, <4 x double> %0, <4 x double> zeroinitializer
				ret <4 x double> %2
				}

				define <4 x float> @test_mm_mask_sqrt_ps(<4 x float> %__W, i8 zeroext %__U, <4 x float> %__A) {
				; X32-LABEL: test_mm_mask_sqrt_ps:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtps %xmm1, %xmm0 {%k1}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm_mask_sqrt_ps:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtps %xmm1, %xmm0 {%k1}
				; X64-NEXT: retq
				entry:
				%0 = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%2 = select <4 x i1> %extract.i, <4 x float> %0, <4 x float> %__W
				ret <4 x float> %2
				}

				declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

				define <4 x float> @test_mm_maskz_sqrt_ps(i8 zeroext %__U, <4 x float> %__A) {
				; X32-LABEL: test_mm_maskz_sqrt_ps:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtps %xmm0, %xmm0 {%k1} {z}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm_maskz_sqrt_ps:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtps %xmm0, %xmm0 {%k1} {z}
				; X64-NEXT: retq
				entry:
				%0 = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%extract.i = shufflevector <8 x i1> %1, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%2 = select <4 x i1> %extract.i, <4 x float> %0, <4 x float> zeroinitializer
				ret <4 x float> %2
				}

				define <8 x float> @test_mm256_mask_sqrt_ps(<8 x float> %__W, i8 zeroext %__U, <8 x float> %__A) {
				; X32-LABEL: test_mm256_mask_sqrt_ps:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtps %ymm1, %ymm0 {%k1}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm256_mask_sqrt_ps:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtps %ymm1, %ymm0 {%k1}
				; X64-NEXT: retq
				entry:
				%0 = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%2 = select <8 x i1> %1, <8 x float> %0, <8 x float> %__W
				ret <8 x float> %2
				}

				define <8 x float> @test_mm256_maskz_sqrt_ps(i8 zeroext %__U, <8 x float> %__A) {
				; X32-LABEL: test_mm256_maskz_sqrt_ps:
				; X32: # %bb.0: # %entry
				; X32-NEXT: movb {{[0-9]+}}(%esp), %al
				; X32-NEXT: kmovw %eax, %k1
				; X32-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z}
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm256_maskz_sqrt_ps:
				; X64: # %bb.0: # %entry
				; X64-NEXT: kmovw %edi, %k1
				; X64-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z}
				; X64-NEXT: retq
				entry:
				%0 = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %__A) #2
				%1 = bitcast i8 %__U to <8 x i1>
				%2 = select <8 x i1> %1, <8 x float> %0, <8 x float> zeroinitializer
				ret <8 x float> %2
				}

				declare <8 x float> @llvm.sqrt.v8f32(<8 x float>)

				declare <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32>)
				declare <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32>)
	declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2dq.128(<2 x double>, <4 x i32>, i8)			declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2dq.128(<2 x double>, <4 x i32>, i8)
	declare <4 x i32> @llvm.x86.avx.cvt.pd2dq.256(<4 x double>)			declare <4 x i32> @llvm.x86.avx.cvt.pd2dq.256(<4 x double>)
	declare <4 x float> @llvm.x86.avx512.mask.cvtpd2ps(<2 x double>, <4 x float>, i8)			declare <4 x float> @llvm.x86.avx512.mask.cvtpd2ps(<2 x double>, <4 x float>, i8)
	declare <4 x float> @llvm.x86.avx.cvt.pd2.ps.256(<4 x double>)			declare <4 x float> @llvm.x86.avx.cvt.pd2.ps.256(<4 x double>)
	declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2udq.128(<2 x double>, <4 x i32>, i8)			declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2udq.128(<2 x double>, <4 x i32>, i8)
	declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2udq.256(<4 x double>, <4 x i32>, i8)			declare <4 x i32> @llvm.x86.avx512.mask.cvtpd2udq.256(<4 x double>, <4 x i32>, i8)
	declare <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)			declare <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
	declare <8 x i32> @llvm.x86.avx.cvt.ps2dq.256(<8 x float>)			declare <8 x i32> @llvm.x86.avx.cvt.ps2dq.256(<8 x float>)
	Show All 37 Lines

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 12,064 Lines • ▼ Show 20 Lines
	; X64-LABEL: test_expand_load_d_256:			; X64-LABEL: test_expand_load_d_256:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: kxnorw %k0, %k0, %k1 # encoding: [0xc5,0xfc,0x46,0xc8]			; X64-NEXT: kxnorw %k0, %k0, %k1 # encoding: [0xc5,0xfc,0x46,0xc8]
	; X64-NEXT: vpexpandd (%rdi), %ymm0 {%k1} # encoding: [0x62,0xf2,0x7d,0x29,0x89,0x07]			; X64-NEXT: vpexpandd (%rdi), %ymm0 {%k1} # encoding: [0x62,0xf2,0x7d,0x29,0x89,0x07]
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x i32> @llvm.x86.avx512.mask.expand.load.d.256(i8* %addr, <8 x i32> %data, i8 -1)			%res = call <8 x i32> @llvm.x86.avx512.mask.expand.load.d.256(i8* %addr, <8 x i32> %data, i8 -1)
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

				define <4 x double> @test_sqrt_pd_256(<4 x double> %a0, i8 %mask) {
				; X86-LABEL: test_sqrt_pd_256:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovw %eax, %k1 # encoding: [0xc5,0xf8,0x92,0xc8]
				; X86-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0xfd,0xa9,0x51,0xc0]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_sqrt_pd_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovw %edi, %k1 # encoding: [0xc5,0xf8,0x92,0xcf]
				; X64-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0xfd,0xa9,0x51,0xc0]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <4 x double> @llvm.x86.avx512.mask.sqrt.pd.256(<4 x double> %a0, <4 x double> zeroinitializer, i8 %mask)
				ret <4 x double> %res
				}
				declare <4 x double> @llvm.x86.avx512.mask.sqrt.pd.256(<4 x double>, <4 x double>, i8) nounwind readnone

				define <8 x float> @test_sqrt_ps_256(<8 x float> %a0, i8 %mask) {
				; X86-LABEL: test_sqrt_ps_256:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovw %eax, %k1 # encoding: [0xc5,0xf8,0x92,0xc8]
				; X86-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7c,0xa9,0x51,0xc0]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_sqrt_ps_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovw %edi, %k1 # encoding: [0xc5,0xf8,0x92,0xcf]
				; X64-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7c,0xa9,0x51,0xc0]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <8 x float> @llvm.x86.avx512.mask.sqrt.ps.256(<8 x float> %a0, <8 x float> zeroinitializer, i8 %mask)
				ret <8 x float> %res
				}

				declare <8 x float> @llvm.x86.avx512.mask.sqrt.ps.256(<8 x float>, <8 x float>, i8) nounwind readnone

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 993 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vminps %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x5d,0xc1]			; CHECK-NEXT: vminps %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x5d,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%1 = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %a0, <4 x float> %a1)			%1 = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %a0, <4 x float> %a1)
	ret <4 x float> %1			ret <4 x float> %1
	}			}
	declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>)			declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>)

	define <4 x double> @test_sqrt_pd_256(<4 x double> %a0, i8 %mask) {
	; X86-LABEL: test_sqrt_pd_256:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovw %eax, %k1 # encoding: [0xc5,0xf8,0x92,0xc8]
	; X86-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0xfd,0xa9,0x51,0xc0]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_sqrt_pd_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovw %edi, %k1 # encoding: [0xc5,0xf8,0x92,0xcf]
	; X64-NEXT: vsqrtpd %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0xfd,0xa9,0x51,0xc0]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <4 x double> @llvm.x86.avx512.mask.sqrt.pd.256(<4 x double> %a0, <4 x double> zeroinitializer, i8 %mask)
	ret <4 x double> %res
	}
	declare <4 x double> @llvm.x86.avx512.mask.sqrt.pd.256(<4 x double>, <4 x double>, i8) nounwind readnone

	define <8 x float> @test_sqrt_ps_256(<8 x float> %a0, i8 %mask) {
	; X86-LABEL: test_sqrt_ps_256:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovw %eax, %k1 # encoding: [0xc5,0xf8,0x92,0xc8]
	; X86-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7c,0xa9,0x51,0xc0]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_sqrt_ps_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovw %edi, %k1 # encoding: [0xc5,0xf8,0x92,0xcf]
	; X64-NEXT: vsqrtps %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7c,0xa9,0x51,0xc0]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x float> @llvm.x86.avx512.mask.sqrt.ps.256(<8 x float> %a0, <8 x float> zeroinitializer, i8 %mask)
	ret <8 x float> %res
	}

	declare <8 x float> @llvm.x86.avx512.mask.sqrt.ps.256(<8 x float>, <8 x float>, i8) nounwind readnone

	define <4 x double> @test_getexp_pd_256(<4 x double> %a0) {			define <4 x double> @test_getexp_pd_256(<4 x double> %a0) {
	; CHECK-LABEL: test_getexp_pd_256:			; CHECK-LABEL: test_getexp_pd_256:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vgetexppd %ymm0, %ymm0 # encoding: [0x62,0xf2,0xfd,0x28,0x42,0xc0]			; CHECK-NEXT: vgetexppd %ymm0, %ymm0 # encoding: [0x62,0xf2,0xfd,0x28,0x42,0xc0]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <4 x double> @llvm.x86.avx512.mask.getexp.pd.256(<4 x double> %a0, <4 x double> zeroinitializer, i8 -1)			%res = call <4 x double> @llvm.x86.avx512.mask.getexp.pd.256(<4 x double> %a0, <4 x double> zeroinitializer, i8 -1)
	ret <4 x double> %res			ret <4 x double> %res
	}			}
	▲ Show 20 Lines • Show All 6,892 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fold-load-unops.ll

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%res = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %ins)		%res = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %ins)
%ext = extractelement <4 x float> %res, i32 0		%ext = extractelement <4 x float> %res, i32 0
ret float %ext		ret float %ext
}		}

define <4 x float> @sqrtss_full_size(<4 x float>* %a) optsize{		define <4 x float> @sqrtss_full_size(<4 x float>* %a) optsize{
; SSE-LABEL: sqrtss_full_size:		; SSE-LABEL: sqrtss_full_size:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: sqrtss (%rdi), %xmm0		; SSE-NEXT: movaps (%rdi), %xmm0
		; SSE-NEXT: sqrtss %xmm0, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sqrtss_full_size:		; AVX-LABEL: sqrtss_full_size:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vsqrtss (%rdi), %xmm0, %xmm0		; AVX-NEXT: vmovaps (%rdi), %xmm0
		; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%ld = load <4 x float>, <4 x float>* %a		%ld = load <4 x float>, <4 x float>* %a
%res = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %ld)		%res = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %ld)
ret <4 x float> %res		ret <4 x float> %res
}		}

define double @sqrtsd_size(double* %a) optsize {		define double @sqrtsd_size(double* %a) optsize {
; SSE-LABEL: sqrtsd_size:		; SSE-LABEL: sqrtsd_size:
Show All 10 Lines	; AVX-NEXT: retq
%res = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %ins)		%res = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %ins)
%ext = extractelement <2 x double> %res, i32 0		%ext = extractelement <2 x double> %res, i32 0
ret double %ext		ret double %ext
}		}

define <2 x double> @sqrtsd_full_size(<2 x double>* %a) optsize {		define <2 x double> @sqrtsd_full_size(<2 x double>* %a) optsize {
; SSE-LABEL: sqrtsd_full_size:		; SSE-LABEL: sqrtsd_full_size:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: sqrtsd (%rdi), %xmm0		; SSE-NEXT: movapd (%rdi), %xmm0
		; SSE-NEXT: sqrtsd %xmm0, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sqrtsd_full_size:		; AVX-LABEL: sqrtsd_full_size:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vsqrtsd (%rdi), %xmm0, %xmm0		; AVX-NEXT: vmovapd (%rdi), %xmm0
		; AVX-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%ld = load <2 x double>, <2 x double>* %a		%ld = load <2 x double>, <2 x double>* %a
%res = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %ld)		%res = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %ld)
ret <2 x double> %res		ret <2 x double> %res
}		}

declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone		declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone

llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 2,069 Lines • ▼ Show 20 Lines
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: sqrtps %xmm0, %xmm0			; SSE-NEXT: sqrtps %xmm0, %xmm0
	; SSE-NEXT: ret{{[l\|q]}}			; SSE-NEXT: ret{{[l\|q]}}
	;			;
	; AVX-LABEL: test_mm_sqrt_ps:			; AVX-LABEL: test_mm_sqrt_ps:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vsqrtps %xmm0, %xmm0			; AVX-NEXT: vsqrtps %xmm0, %xmm0
	; AVX-NEXT: ret{{[l\|q]}}			; AVX-NEXT: ret{{[l\|q]}}
	%res = call <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float> %a0)			%res = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a0)
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) nounwind readnone

	define <4 x float> @test_mm_sqrt_ss(<4 x float> %a0) {			define <4 x float> @test_mm_sqrt_ss(<4 x float> %a0) {
	; SSE-LABEL: test_mm_sqrt_ss:			; SSE-LABEL: test_mm_sqrt_ss:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: sqrtss %xmm0, %xmm0			; SSE-NEXT: sqrtss %xmm0, %xmm0
	; SSE-NEXT: ret{{[l\|q]}}			; SSE-NEXT: ret{{[l\|q]}}
	;			;
	; AVX-LABEL: test_mm_sqrt_ss:			; AVX-LABEL: test_mm_sqrt_ss:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0			; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
	; AVX-NEXT: ret{{[l\|q]}}			; AVX-NEXT: ret{{[l\|q]}}
	%sqrt = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a0)			%ext = extractelement <4 x float> %a0, i32 0
	ret <4 x float> %sqrt			%sqrt = call float @llvm.sqrt.f32(float %ext)
				%ins = insertelement <4 x float> %a0, float %sqrt, i32 0
				ret <4 x float> %ins
	}			}
	declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone			declare float @llvm.sqrt.f32(float) nounwind readnone

	define void @test_mm_store_ps(float *%a0, <4 x float> %a1) {			define void @test_mm_store_ps(float *%a0, <4 x float> %a1) {
	; X86-SSE-LABEL: test_mm_store_ps:			; X86-SSE-LABEL: test_mm_store_ps:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE-NEXT: movaps %xmm0, (%eax)			; X86-SSE-NEXT: movaps %xmm0, (%eax)
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 632 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse-intrinsics-x86-upgrade.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+sse -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,SSE,X86-SSE			; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+sse -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,SSE,X86-SSE
	; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX1,X86-AVX1			; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX1,X86-AVX1
	; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX512,X86-AVX512			; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX512,X86-AVX512
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=-sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,SSE,X64-SSE			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=-sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,SSE,X64-SSE
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX1,X64-AVX1			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX1,X64-AVX1
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX512,X64-AVX512			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX512,X64-AVX512


				define <4 x float> @test_x86_sse_sqrt_ps(<4 x float> %a0) {
				; SSE-LABEL: test_x86_sse_sqrt_ps:
				; SSE: ## %bb.0:
				; SSE-NEXT: sqrtps %xmm0, %xmm0 ## encoding: [0x0f,0x51,0xc0]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX1-LABEL: test_x86_sse_sqrt_ps:
				; AVX1: ## %bb.0:
				; AVX1-NEXT: vsqrtps %xmm0, %xmm0 ## encoding: [0xc5,0xf8,0x51,0xc0]
				; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX512-LABEL: test_x86_sse_sqrt_ps:
				; AVX512: ## %bb.0:
				; AVX512-NEXT: vsqrtps %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf8,0x51,0xc0]
				; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float> %a0) ; <<4 x float>> [#uses=1]
				ret <4 x float> %res
				}
				declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone


				define <4 x float> @test_x86_sse_sqrt_ss(<4 x float> %a0) {
				; SSE-LABEL: test_x86_sse_sqrt_ss:
				; SSE: ## %bb.0:
				; SSE-NEXT: sqrtss %xmm0, %xmm0 ## encoding: [0xf3,0x0f,0x51,0xc0]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX-LABEL: test_x86_sse_sqrt_ss:
				; AVX: ## %bb.0:
				; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x51,0xc0]
				; AVX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a0) ; <<4 x float>> [#uses=1]
				ret <4 x float> %res
				}
				declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone


	define void @test_x86_sse_storeu_ps(i8* %a0, <4 x float> %a1) {			define void @test_x86_sse_storeu_ps(i8* %a0, <4 x float> %a1) {
	; X86-SSE-LABEL: test_x86_sse_storeu_ps:			; X86-SSE-LABEL: test_x86_sse_storeu_ps:
	; X86-SSE: ## %bb.0:			; X86-SSE: ## %bb.0:
	; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]			; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-SSE-NEXT: movups %xmm0, (%eax) ## encoding: [0x0f,0x11,0x00]			; X86-SSE-NEXT: movups %xmm0, (%eax) ## encoding: [0x0f,0x11,0x00]
	; X86-SSE-NEXT: retl ## encoding: [0xc3]			; X86-SSE-NEXT: retl ## encoding: [0xc3]
	;			;
	; X86-AVX1-LABEL: test_x86_sse_storeu_ps:			; X86-AVX1-LABEL: test_x86_sse_storeu_ps:
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse-intrinsics-x86.ll

	Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vrsqrtss %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x52,0xc0]			; AVX-NEXT: vrsqrtss %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x52,0xc0]
	; AVX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; AVX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float> %a0) ; <<4 x float>> [#uses=1]			%res = call <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float> %a0) ; <<4 x float>> [#uses=1]
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone			declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone


	define <4 x float> @test_x86_sse_sqrt_ps(<4 x float> %a0) {
	; SSE-LABEL: test_x86_sse_sqrt_ps:
	; SSE: ## %bb.0:
	; SSE-NEXT: sqrtps %xmm0, %xmm0 ## encoding: [0x0f,0x51,0xc0]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse_sqrt_ps:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vsqrtps %xmm0, %xmm0 ## encoding: [0xc5,0xf8,0x51,0xc0]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse_sqrt_ps:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vsqrtps %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf8,0x51,0xc0]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float> %a0) ; <<4 x float>> [#uses=1]
	ret <4 x float> %res
	}
	declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone


	define <4 x float> @test_x86_sse_sqrt_ss(<4 x float> %a0) {
	; SSE-LABEL: test_x86_sse_sqrt_ss:
	; SSE: ## %bb.0:
	; SSE-NEXT: sqrtss %xmm0, %xmm0 ## encoding: [0xf3,0x0f,0x51,0xc0]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse_sqrt_ss:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfa,0x51,0xc0]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse_sqrt_ss:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfa,0x51,0xc0]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a0) ; <<4 x float>> [#uses=1]
	ret <4 x float> %res
	}
	declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone


	define void @test_x86_sse_stmxcsr(i8* %a0) {			define void @test_x86_sse_stmxcsr(i8* %a0) {
	; X86-SSE-LABEL: test_x86_sse_stmxcsr:			; X86-SSE-LABEL: test_x86_sse_stmxcsr:
	; X86-SSE: ## %bb.0:			; X86-SSE: ## %bb.0:
	; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]			; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-SSE-NEXT: stmxcsr (%eax) ## encoding: [0x0f,0xae,0x18]			; X86-SSE-NEXT: stmxcsr (%eax) ## encoding: [0x0f,0xae,0x18]
	; X86-SSE-NEXT: retl ## encoding: [0xc3]			; X86-SSE-NEXT: retl ## encoding: [0xc3]
	;			;
	; X86-AVX-LABEL: test_x86_sse_stmxcsr:			; X86-AVX-LABEL: test_x86_sse_stmxcsr:
	▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	; AVX-NEXT: ret{{[l\|q]}}
%div = fdiv float %2, %1		%div = fdiv float %2, %1
%3 = insertelement <4 x float> %a, float %div, i32 0		%3 = insertelement <4 x float> %a, float %div, i32 0
ret <4 x float> %3		ret <4 x float> %3
}		}

define <4 x float> @test_sqrt_ss(<4 x float> %a) {		define <4 x float> @test_sqrt_ss(<4 x float> %a) {
; SSE2-LABEL: test_sqrt_ss:		; SSE2-LABEL: test_sqrt_ss:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: sqrtss %xmm0, %xmm1		; SSE2-NEXT: sqrtss %xmm0, %xmm0
; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; SSE2-NEXT: ret{{[l\|q]}}		; SSE2-NEXT: ret{{[l\|q]}}
;		;
; SSE41-LABEL: test_sqrt_ss:		; SSE41-LABEL: test_sqrt_ss:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: sqrtss %xmm0, %xmm1		; SSE41-NEXT: sqrtss %xmm0, %xmm0
; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; SSE41-NEXT: ret{{[l\|q]}}		; SSE41-NEXT: ret{{[l\|q]}}
;		;
; AVX1-LABEL: test_sqrt_ss:		; AVX1-LABEL: test_sqrt_ss:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vsqrtss %xmm0, %xmm0, %xmm1		; AVX1-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
; AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; AVX1-NEXT: ret{{[l\|q]}}		; AVX1-NEXT: ret{{[l\|q]}}
;		;
; AVX512-LABEL: test_sqrt_ss:		; AVX512-LABEL: test_sqrt_ss:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vsqrtss %xmm0, %xmm0, %xmm1		; AVX512-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
; AVX512-NEXT: vmovss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; AVX512-NEXT: ret{{[l\|q]}}		; AVX512-NEXT: ret{{[l\|q]}}
%1 = extractelement <4 x float> %a, i32 0		%1 = extractelement <4 x float> %a, i32 0
%2 = call float @llvm.sqrt.f32(float %1)		%2 = call float @llvm.sqrt.f32(float %1)
%3 = insertelement <4 x float> %a, float %2, i32 0		%3 = insertelement <4 x float> %a, float %2, i32 0
ret <4 x float> %3		ret <4 x float> %3
}		}
declare float @llvm.sqrt.f32(float)		declare float @llvm.sqrt.f32(float)

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; AVX-NEXT: ret{{[l\|q]}}
%div = fdiv double %2, %1		%div = fdiv double %2, %1
%3 = insertelement <2 x double> %a, double %div, i32 0		%3 = insertelement <2 x double> %a, double %div, i32 0
ret <2 x double> %3		ret <2 x double> %3
}		}

define <2 x double> @test_sqrt_sd(<2 x double> %a) {		define <2 x double> @test_sqrt_sd(<2 x double> %a) {
; SSE2-LABEL: test_sqrt_sd:		; SSE2-LABEL: test_sqrt_sd:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: sqrtsd %xmm0, %xmm1		; SSE2-NEXT: sqrtsd %xmm0, %xmm0
; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE2-NEXT: ret{{[l\|q]}}		; SSE2-NEXT: ret{{[l\|q]}}
;		;
; SSE41-LABEL: test_sqrt_sd:		; SSE41-LABEL: test_sqrt_sd:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: sqrtsd %xmm0, %xmm1		; SSE41-NEXT: sqrtsd %xmm0, %xmm0
; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; SSE41-NEXT: ret{{[l\|q]}}		; SSE41-NEXT: ret{{[l\|q]}}
;		;
; AVX1-LABEL: test_sqrt_sd:		; AVX1-LABEL: test_sqrt_sd:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm1		; AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
; AVX1-NEXT: vblendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; AVX1-NEXT: ret{{[l\|q]}}		; AVX1-NEXT: ret{{[l\|q]}}
;		;
; AVX512-LABEL: test_sqrt_sd:		; AVX512-LABEL: test_sqrt_sd:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm1		; AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
; AVX512-NEXT: vmovsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
; AVX512-NEXT: ret{{[l\|q]}}		; AVX512-NEXT: ret{{[l\|q]}}
%1 = extractelement <2 x double> %a, i32 0		%1 = extractelement <2 x double> %a, i32 0
%2 = call double @llvm.sqrt.f64(double %1)		%2 = call double @llvm.sqrt.f64(double %1)
%3 = insertelement <2 x double> %a, double %2, i32 0		%3 = insertelement <2 x double> %a, double %2, i32 0
ret <2 x double> %3		ret <2 x double> %3
}		}
declare double @llvm.sqrt.f64(double)		declare double @llvm.sqrt.f64(double)

▲ Show 20 Lines • Show All 1,155 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 3,714 Lines • ▼ Show 20 Lines
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: sqrtpd %xmm0, %xmm0			; SSE-NEXT: sqrtpd %xmm0, %xmm0
	; SSE-NEXT: ret{{[l\|q]}}			; SSE-NEXT: ret{{[l\|q]}}
	;			;
	; AVX-LABEL: test_mm_sqrt_pd:			; AVX-LABEL: test_mm_sqrt_pd:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vsqrtpd %xmm0, %xmm0			; AVX-NEXT: vsqrtpd %xmm0, %xmm0
	; AVX-NEXT: ret{{[l\|q]}}			; AVX-NEXT: ret{{[l\|q]}}
	%res = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %a0)			%res = call <2 x double> @llvm.sqrt.v2f64(<2 x double> %a0)
	ret <2 x double> %res			ret <2 x double> %res
	}			}
	declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) nounwind readnone			declare <2 x double> @llvm.sqrt.v2f64(<2 x double>) nounwind readnone

	define <2 x double> @test_mm_sqrt_sd(<2 x double> %a0, <2 x double> %a1) nounwind {			define <2 x double> @test_mm_sqrt_sd(<2 x double> %a0, <2 x double> %a1) nounwind {
	; SSE-LABEL: test_mm_sqrt_sd:			; SSE-LABEL: test_mm_sqrt_sd:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: sqrtsd %xmm0, %xmm1			; SSE-NEXT: sqrtsd %xmm0, %xmm1
	; SSE-NEXT: movapd %xmm1, %xmm0			; SSE-NEXT: movapd %xmm1, %xmm0
	; SSE-NEXT: ret{{[l\|q]}}			; SSE-NEXT: ret{{[l\|q]}}
	;			;
	; AVX-LABEL: test_mm_sqrt_sd:			; AVX-LABEL: test_mm_sqrt_sd:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vsqrtsd %xmm0, %xmm1, %xmm0			; AVX-NEXT: vsqrtsd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: ret{{[l\|q]}}			; AVX-NEXT: ret{{[l\|q]}}
	%call = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a0)			%ext = extractelement <2 x double> %a0, i32 0
	%ext0 = extractelement <2 x double> %call, i32 0			%sqrt = call double @llvm.sqrt.f64(double %ext)
	%ins0 = insertelement <2 x double> undef, double %ext0, i32 0			%ins = insertelement <2 x double> %a1, double %sqrt, i32 0
	%ext1 = extractelement <2 x double> %a1, i32 1			ret <2 x double> %ins
	%ins1 = insertelement <2 x double> %ins0, double %ext1, i32 1
	ret <2 x double> %ins1
	}			}
	declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone			declare double @llvm.sqrt.f64(double) nounwind readnone

	define <2 x i64> @test_mm_sra_epi16(<2 x i64> %a0, <2 x i64> %a1) {			define <2 x i64> @test_mm_sra_epi16(<2 x i64> %a0, <2 x i64> %a1) {
	; SSE-LABEL: test_mm_sra_epi16:			; SSE-LABEL: test_mm_sra_epi16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: psraw %xmm1, %xmm0			; SSE-NEXT: psraw %xmm1, %xmm0
	; SSE-NEXT: ret{{[l\|q]}}			; SSE-NEXT: ret{{[l\|q]}}
	;			;
	; AVX-LABEL: test_mm_sra_epi16:			; AVX-LABEL: test_mm_sra_epi16:
	▲ Show 20 Lines • Show All 1,060 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,SSE,X86-SSE			; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,SSE,X86-SSE
	; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX1,X86-AVX1			; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX1,X86-AVX1
	; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX512,X86-AVX512			; RUN: llc < %s -disable-peephole -mtriple=i386-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X86,AVX,X86-AVX,AVX512,X86-AVX512
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,SSE,X64-SSE			; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+sse2 -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,SSE,X64-SSE
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX1,X64-AVX1			; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+avx -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX1,X64-AVX1
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX512,X64-AVX512			; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512dq,+avx512vl -show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64,AVX,X64-AVX,AVX512,X64-AVX512


				define <2 x double> @test_x86_sse2_sqrt_pd(<2 x double> %a0) {
				; SSE-LABEL: test_x86_sse2_sqrt_pd:
				; SSE: ## %bb.0:
				; SSE-NEXT: sqrtpd %xmm0, %xmm0 ## encoding: [0x66,0x0f,0x51,0xc0]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX1-LABEL: test_x86_sse2_sqrt_pd:
				; AVX1: ## %bb.0:
				; AVX1-NEXT: vsqrtpd %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x51,0xc0]
				; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX512-LABEL: test_x86_sse2_sqrt_pd:
				; AVX512: ## %bb.0:
				; AVX512-NEXT: vsqrtpd %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x51,0xc0]
				; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %a0) ; <<2 x double>> [#uses=1]
				ret <2 x double> %res
				}
				declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) nounwind readnone


				define <2 x double> @test_x86_sse2_sqrt_sd(<2 x double> %a0) {
				; SSE-LABEL: test_x86_sse2_sqrt_sd:
				; SSE: ## %bb.0:
				; SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX-LABEL: test_x86_sse2_sqrt_sd:
				; AVX: ## %bb.0:
				; AVX-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
				; AVX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a0) ; <<2 x double>> [#uses=1]
				ret <2 x double> %res
				}
				declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone


				define <2 x double> @test_x86_sse2_sqrt_sd_vec_load(<2 x double>* %a0) {
				; X86-SSE-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X86-SSE: ## %bb.0:
				; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
				; X86-SSE-NEXT: movapd (%eax), %xmm0 ## encoding: [0x66,0x0f,0x28,0x00]
				; X86-SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
				; X86-SSE-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX1-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X86-AVX1: ## %bb.0:
				; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
				; X86-AVX1-NEXT: vmovapd (%eax), %xmm0 ## encoding: [0xc5,0xf9,0x28,0x00]
				; X86-AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
				; X86-AVX1-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X86-AVX512: ## %bb.0:
				; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
				; X86-AVX512-NEXT: vmovapd (%eax), %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x28,0x00]
				; X86-AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
				; X86-AVX512-NEXT: retl ## encoding: [0xc3]
				;
				; X64-SSE-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X64-SSE: ## %bb.0:
				; X64-SSE-NEXT: movapd (%rdi), %xmm0 ## encoding: [0x66,0x0f,0x28,0x07]
				; X64-SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
				; X64-SSE-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX1-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X64-AVX1: ## %bb.0:
				; X64-AVX1-NEXT: vmovapd (%rdi), %xmm0 ## encoding: [0xc5,0xf9,0x28,0x07]
				; X64-AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
				; X64-AVX1-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512-LABEL: test_x86_sse2_sqrt_sd_vec_load:
				; X64-AVX512: ## %bb.0:
				; X64-AVX512-NEXT: vmovapd (%rdi), %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x28,0x07]
				; X64-AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
				; X64-AVX512-NEXT: retq ## encoding: [0xc3]
				%a1 = load <2 x double>, <2 x double>* %a0, align 16
				%res = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a1) ; <<2 x double>> [#uses=1]
				ret <2 x double> %res
				}


	define <2 x i64> @test_x86_sse2_psll_dq_bs(<2 x i64> %a0) {			define <2 x i64> @test_x86_sse2_psll_dq_bs(<2 x i64> %a0) {
	; SSE-LABEL: test_x86_sse2_psll_dq_bs:			; SSE-LABEL: test_x86_sse2_psll_dq_bs:
	; SSE: ## %bb.0:			; SSE: ## %bb.0:
	; SSE-NEXT: pslldq $7, %xmm0 ## encoding: [0x66,0x0f,0x73,0xf8,0x07]			; SSE-NEXT: pslldq $7, %xmm0 ## encoding: [0x66,0x0f,0x73,0xf8,0x07]
	; SSE-NEXT: ## xmm0 = zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8]			; SSE-NEXT: ## xmm0 = zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7,8]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;			;
	; AVX1-LABEL: test_x86_sse2_psll_dq_bs:			; AVX1-LABEL: test_x86_sse2_psll_dq_bs:
	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines


	define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {			define void @test_x86_sse2_storeu_pd(i8* %a0, <2 x double> %a1) {
	; fadd operation forces the execution domain.			; fadd operation forces the execution domain.
	; X86-SSE-LABEL: test_x86_sse2_storeu_pd:			; X86-SSE-LABEL: test_x86_sse2_storeu_pd:
	; X86-SSE: ## %bb.0:			; X86-SSE: ## %bb.0:
	; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]			; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-SSE-NEXT: xorpd %xmm1, %xmm1 ## encoding: [0x66,0x0f,0x57,0xc9]			; X86-SSE-NEXT: xorpd %xmm1, %xmm1 ## encoding: [0x66,0x0f,0x57,0xc9]
	; X86-SSE-NEXT: movhpd LCPI8_0, %xmm1 ## encoding: [0x66,0x0f,0x16,0x0d,A,A,A,A]			; X86-SSE-NEXT: movhpd LCPI11_0, %xmm1 ## encoding: [0x66,0x0f,0x16,0x0d,A,A,A,A]
	; X86-SSE-NEXT: ## fixup A - offset: 4, value: LCPI8_0, kind: FK_Data_4			; X86-SSE-NEXT: ## fixup A - offset: 4, value: LCPI11_0, kind: FK_Data_4
	; X86-SSE-NEXT: ## xmm1 = xmm1[0],mem[0]			; X86-SSE-NEXT: ## xmm1 = xmm1[0],mem[0]
	; X86-SSE-NEXT: addpd %xmm0, %xmm1 ## encoding: [0x66,0x0f,0x58,0xc8]			; X86-SSE-NEXT: addpd %xmm0, %xmm1 ## encoding: [0x66,0x0f,0x58,0xc8]
	; X86-SSE-NEXT: movupd %xmm1, (%eax) ## encoding: [0x66,0x0f,0x11,0x08]			; X86-SSE-NEXT: movupd %xmm1, (%eax) ## encoding: [0x66,0x0f,0x11,0x08]
	; X86-SSE-NEXT: retl ## encoding: [0xc3]			; X86-SSE-NEXT: retl ## encoding: [0xc3]
	;			;
	; X86-AVX1-LABEL: test_x86_sse2_storeu_pd:			; X86-AVX1-LABEL: test_x86_sse2_storeu_pd:
	; X86-AVX1: ## %bb.0:			; X86-AVX1: ## %bb.0:
	; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x57,0xc9]			; X86-AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x57,0xc9]
	; X86-AVX1-NEXT: vmovhpd LCPI8_0, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x16,0x0d,A,A,A,A]			; X86-AVX1-NEXT: vmovhpd LCPI11_0, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x16,0x0d,A,A,A,A]
	; X86-AVX1-NEXT: ## fixup A - offset: 4, value: LCPI8_0, kind: FK_Data_4			; X86-AVX1-NEXT: ## fixup A - offset: 4, value: LCPI11_0, kind: FK_Data_4
	; X86-AVX1-NEXT: ## xmm1 = xmm1[0],mem[0]			; X86-AVX1-NEXT: ## xmm1 = xmm1[0],mem[0]
	; X86-AVX1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x58,0xc1]			; X86-AVX1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x58,0xc1]
	; X86-AVX1-NEXT: vmovupd %xmm0, (%eax) ## encoding: [0xc5,0xf9,0x11,0x00]			; X86-AVX1-NEXT: vmovupd %xmm0, (%eax) ## encoding: [0xc5,0xf9,0x11,0x00]
	; X86-AVX1-NEXT: retl ## encoding: [0xc3]			; X86-AVX1-NEXT: retl ## encoding: [0xc3]
	;			;
	; X86-AVX512-LABEL: test_x86_sse2_storeu_pd:			; X86-AVX512-LABEL: test_x86_sse2_storeu_pd:
	; X86-AVX512: ## %bb.0:			; X86-AVX512: ## %bb.0:
	; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]			; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX512-NEXT: vmovsd LCPI8_0, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x0d,A,A,A,A]			; X86-AVX512-NEXT: vmovsd LCPI11_0, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x0d,A,A,A,A]
	; X86-AVX512-NEXT: ## fixup A - offset: 4, value: LCPI8_0, kind: FK_Data_4			; X86-AVX512-NEXT: ## fixup A - offset: 4, value: LCPI11_0, kind: FK_Data_4
	; X86-AVX512-NEXT: ## xmm1 = mem[0],zero			; X86-AVX512-NEXT: ## xmm1 = mem[0],zero
	; X86-AVX512-NEXT: vpslldq $8, %xmm1, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0x73,0xf9,0x08]			; X86-AVX512-NEXT: vpslldq $8, %xmm1, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0x73,0xf9,0x08]
	; X86-AVX512-NEXT: ## xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]			; X86-AVX512-NEXT: ## xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
	; X86-AVX512-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x58,0xc1]			; X86-AVX512-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x58,0xc1]
	; X86-AVX512-NEXT: vmovupd %xmm0, (%eax) ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x11,0x00]			; X86-AVX512-NEXT: vmovupd %xmm0, (%eax) ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x11,0x00]
	; X86-AVX512-NEXT: retl ## encoding: [0xc3]			; X86-AVX512-NEXT: retl ## encoding: [0xc3]
	;			;
	; X64-SSE-LABEL: test_x86_sse2_storeu_pd:			; X64-SSE-LABEL: test_x86_sse2_storeu_pd:
	; X64-SSE: ## %bb.0:			; X64-SSE: ## %bb.0:
	; X64-SSE-NEXT: xorpd %xmm1, %xmm1 ## encoding: [0x66,0x0f,0x57,0xc9]			; X64-SSE-NEXT: xorpd %xmm1, %xmm1 ## encoding: [0x66,0x0f,0x57,0xc9]
	; X64-SSE-NEXT: movhpd {{.*}}(%rip), %xmm1 ## encoding: [0x66,0x0f,0x16,0x0d,A,A,A,A]			; X64-SSE-NEXT: movhpd {{.*}}(%rip), %xmm1 ## encoding: [0x66,0x0f,0x16,0x0d,A,A,A,A]
	; X64-SSE-NEXT: ## fixup A - offset: 4, value: LCPI8_0-4, kind: reloc_riprel_4byte			; X64-SSE-NEXT: ## fixup A - offset: 4, value: LCPI11_0-4, kind: reloc_riprel_4byte
	; X64-SSE-NEXT: ## xmm1 = xmm1[0],mem[0]			; X64-SSE-NEXT: ## xmm1 = xmm1[0],mem[0]
	; X64-SSE-NEXT: addpd %xmm0, %xmm1 ## encoding: [0x66,0x0f,0x58,0xc8]			; X64-SSE-NEXT: addpd %xmm0, %xmm1 ## encoding: [0x66,0x0f,0x58,0xc8]
	; X64-SSE-NEXT: movupd %xmm1, (%rdi) ## encoding: [0x66,0x0f,0x11,0x0f]			; X64-SSE-NEXT: movupd %xmm1, (%rdi) ## encoding: [0x66,0x0f,0x11,0x0f]
	; X64-SSE-NEXT: retq ## encoding: [0xc3]			; X64-SSE-NEXT: retq ## encoding: [0xc3]
	;			;
	; X64-AVX1-LABEL: test_x86_sse2_storeu_pd:			; X64-AVX1-LABEL: test_x86_sse2_storeu_pd:
	; X64-AVX1: ## %bb.0:			; X64-AVX1: ## %bb.0:
	; X64-AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x57,0xc9]			; X64-AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x57,0xc9]
	; X64-AVX1-NEXT: vmovhpd {{.*}}(%rip), %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x16,0x0d,A,A,A,A]			; X64-AVX1-NEXT: vmovhpd {{.*}}(%rip), %xmm1, %xmm1 ## encoding: [0xc5,0xf1,0x16,0x0d,A,A,A,A]
	; X64-AVX1-NEXT: ## fixup A - offset: 4, value: LCPI8_0-4, kind: reloc_riprel_4byte			; X64-AVX1-NEXT: ## fixup A - offset: 4, value: LCPI11_0-4, kind: reloc_riprel_4byte
	; X64-AVX1-NEXT: ## xmm1 = xmm1[0],mem[0]			; X64-AVX1-NEXT: ## xmm1 = xmm1[0],mem[0]
	; X64-AVX1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x58,0xc1]			; X64-AVX1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x58,0xc1]
	; X64-AVX1-NEXT: vmovupd %xmm0, (%rdi) ## encoding: [0xc5,0xf9,0x11,0x07]			; X64-AVX1-NEXT: vmovupd %xmm0, (%rdi) ## encoding: [0xc5,0xf9,0x11,0x07]
	; X64-AVX1-NEXT: retq ## encoding: [0xc3]			; X64-AVX1-NEXT: retq ## encoding: [0xc3]
	;			;
	; X64-AVX512-LABEL: test_x86_sse2_storeu_pd:			; X64-AVX512-LABEL: test_x86_sse2_storeu_pd:
	; X64-AVX512: ## %bb.0:			; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vmovsd {{.*}}(%rip), %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x0d,A,A,A,A]			; X64-AVX512-NEXT: vmovsd {{.*}}(%rip), %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x10,0x0d,A,A,A,A]
	; X64-AVX512-NEXT: ## fixup A - offset: 4, value: LCPI8_0-4, kind: reloc_riprel_4byte			; X64-AVX512-NEXT: ## fixup A - offset: 4, value: LCPI11_0-4, kind: reloc_riprel_4byte
	; X64-AVX512-NEXT: ## xmm1 = mem[0],zero			; X64-AVX512-NEXT: ## xmm1 = mem[0],zero
	; X64-AVX512-NEXT: vpslldq $8, %xmm1, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0x73,0xf9,0x08]			; X64-AVX512-NEXT: vpslldq $8, %xmm1, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0x73,0xf9,0x08]
	; X64-AVX512-NEXT: ## xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]			; X64-AVX512-NEXT: ## xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
	; X64-AVX512-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x58,0xc1]			; X64-AVX512-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x58,0xc1]
	; X64-AVX512-NEXT: vmovupd %xmm0, (%rdi) ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x11,0x07]			; X64-AVX512-NEXT: vmovupd %xmm0, (%rdi) ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x11,0x07]
	; X64-AVX512-NEXT: retq ## encoding: [0xc3]			; X64-AVX512-NEXT: retq ## encoding: [0xc3]
	%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>			%a2 = fadd <2 x double> %a1, <double 0x0, double 0x4200000000000000>
	call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)			call void @llvm.x86.sse2.storeu.pd(i8* %a0, <2 x double> %a2)
	▲ Show 20 Lines • Show All 491 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll

	Show First 20 Lines • Show All 1,601 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]			; AVX512-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]			%res = call <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
	ret <8 x i16> %res			ret <8 x i16> %res
	}			}
	declare <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16>, <8 x i16>) nounwind readnone			declare <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16>, <8 x i16>) nounwind readnone


	define <2 x double> @test_x86_sse2_sqrt_pd(<2 x double> %a0) {
	; SSE-LABEL: test_x86_sse2_sqrt_pd:
	; SSE: ## %bb.0:
	; SSE-NEXT: sqrtpd %xmm0, %xmm0 ## encoding: [0x66,0x0f,0x51,0xc0]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_sqrt_pd:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vsqrtpd %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0x51,0xc0]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_sqrt_pd:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vsqrtpd %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x51,0xc0]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double> %a0) ; <<2 x double>> [#uses=1]
	ret <2 x double> %res
	}
	declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) nounwind readnone


	define <2 x double> @test_x86_sse2_sqrt_sd(<2 x double> %a0) {
	; SSE-LABEL: test_x86_sse2_sqrt_sd:
	; SSE: ## %bb.0:
	; SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_sqrt_sd:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_sqrt_sd:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x51,0xc0]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a0) ; <<2 x double>> [#uses=1]
	ret <2 x double> %res
	}
	declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone


	define <2 x double> @test_x86_sse2_sqrt_sd_vec_load(<2 x double>* %a0) {
	; X86-SSE-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X86-SSE: ## %bb.0:
	; X86-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-SSE-NEXT: movapd (%eax), %xmm0 ## encoding: [0x66,0x0f,0x28,0x00]
	; X86-SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
	; X86-SSE-NEXT: retl ## encoding: [0xc3]
	;
	; X86-AVX1-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X86-AVX1: ## %bb.0:
	; X86-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX1-NEXT: vmovapd (%eax), %xmm0 ## encoding: [0xc5,0xf9,0x28,0x00]
	; X86-AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
	; X86-AVX1-NEXT: retl ## encoding: [0xc3]
	;
	; X86-AVX512-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X86-AVX512: ## %bb.0:
	; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x04]
	; X86-AVX512-NEXT: vmovapd (%eax), %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x28,0x00]
	; X86-AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x51,0xc0]
	; X86-AVX512-NEXT: retl ## encoding: [0xc3]
	;
	; X64-SSE-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X64-SSE: ## %bb.0:
	; X64-SSE-NEXT: movapd (%rdi), %xmm0 ## encoding: [0x66,0x0f,0x28,0x07]
	; X64-SSE-NEXT: sqrtsd %xmm0, %xmm0 ## encoding: [0xf2,0x0f,0x51,0xc0]
	; X64-SSE-NEXT: retq ## encoding: [0xc3]
	;
	; X64-AVX1-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X64-AVX1: ## %bb.0:
	; X64-AVX1-NEXT: vmovapd (%rdi), %xmm0 ## encoding: [0xc5,0xf9,0x28,0x07]
	; X64-AVX1-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## encoding: [0xc5,0xfb,0x51,0xc0]
	; X64-AVX1-NEXT: retq ## encoding: [0xc3]
	;
	; X64-AVX512-LABEL: test_x86_sse2_sqrt_sd_vec_load:
	; X64-AVX512: ## %bb.0:
	; X64-AVX512-NEXT: vmovapd (%rdi), %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x28,0x07]
	; X64-AVX512-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfb,0x51,0xc0]
	; X64-AVX512-NEXT: retq ## encoding: [0xc3]
	%a1 = load <2 x double>, <2 x double>* %a0, align 16
	%res = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a1) ; <<2 x double>> [#uses=1]
	ret <2 x double> %res
	}


	define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {			define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {
	; SSE-LABEL: test_x86_sse2_ucomieq_sd:			; SSE-LABEL: test_x86_sse2_ucomieq_sd:
	; SSE: ## %bb.0:			; SSE: ## %bb.0:
	; SSE-NEXT: ucomisd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0x2e,0xc1]			; SSE-NEXT: ucomisd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0x2e,0xc1]
	; SSE-NEXT: setnp %al ## encoding: [0x0f,0x9b,0xc0]			; SSE-NEXT: setnp %al ## encoding: [0x0f,0x9b,0xc0]
	; SSE-NEXT: sete %cl ## encoding: [0x0f,0x94,0xc1]			; SSE-NEXT: sete %cl ## encoding: [0x0f,0x94,0xc1]
	; SSE-NEXT: andb %al, %cl ## encoding: [0x20,0xc1]			; SSE-NEXT: andb %al, %cl ## encoding: [0x20,0xc1]
	; SSE-NEXT: movzbl %cl, %eax ## encoding: [0x0f,0xb6,0xc1]			; SSE-NEXT: movzbl %cl, %eax ## encoding: [0x0f,0xb6,0xc1]
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse_partial_update.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
tail call void @callee(double %conv, double %conv3) nounwind		tail call void @callee(double %conv, double %conv3) nounwind
ret void		ret void
}		}
declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone

define void @sqrtss(<4 x float> %a) nounwind uwtable ssp {		define void @sqrtss(<4 x float> %a) nounwind uwtable ssp {
; CHECK-LABEL: sqrtss:		; CHECK-LABEL: sqrtss:
; CHECK: ## %bb.0: ## %entry		; CHECK: ## %bb.0: ## %entry
; CHECK-NEXT: sqrtss %xmm0, %xmm0		; CHECK-NEXT: sqrtss %xmm0, %xmm1
; CHECK-NEXT: cvtss2sd %xmm0, %xmm2		; CHECK-NEXT: cvtss2sd %xmm1, %xmm2
; CHECK-NEXT: movshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; CHECK-NEXT: movshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
		; CHECK-NEXT: xorps %xmm1, %xmm1
; CHECK-NEXT: cvtss2sd %xmm0, %xmm1		; CHECK-NEXT: cvtss2sd %xmm0, %xmm1
; CHECK-NEXT: movaps %xmm2, %xmm0		; CHECK-NEXT: movaps %xmm2, %xmm0
; CHECK-NEXT: jmp _callee ## TAILCALL		; CHECK-NEXT: jmp _callee ## TAILCALL
entry:		entry:

%0 = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a) nounwind		%0 = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a) nounwind
%a.addr.0.extract = extractelement <4 x float> %0, i32 0		%a.addr.0.extract = extractelement <4 x float> %0, i32 0
%conv = fpext float %a.addr.0.extract to double		%conv = fpext float %a.addr.0.extract to double
%a.addr.4.extract = extractelement <4 x float> %0, i32 1		%a.addr.4.extract = extractelement <4 x float> %0, i32 1
%conv3 = fpext float %a.addr.4.extract to double		%conv3 = fpext float %a.addr.4.extract to double
tail call void @callee(double %conv, double %conv3) nounwind		tail call void @callee(double %conv, double %conv3) nounwind
ret void		ret void
}		}
declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {		define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
; CHECK-LABEL: sqrtsd:		; CHECK-LABEL: sqrtsd:
; CHECK: ## %bb.0: ## %entry		; CHECK: ## %bb.0: ## %entry
; CHECK-NEXT: sqrtsd %xmm0, %xmm0		; CHECK-NEXT: sqrtsd %xmm0, %xmm1
; CHECK-NEXT: cvtsd2ss %xmm0, %xmm2		; CHECK-NEXT: cvtsd2ss %xmm1, %xmm2
; CHECK-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]		; CHECK-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
		; CHECK-NEXT: xorps %xmm1, %xmm1
; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1		; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1
; CHECK-NEXT: movaps %xmm2, %xmm0		; CHECK-NEXT: movaps %xmm2, %xmm0
; CHECK-NEXT: jmp _callee2 ## TAILCALL		; CHECK-NEXT: jmp _callee2 ## TAILCALL
entry:		entry:

%0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind		%0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind
%a0 = extractelement <2 x double> %0, i32 0		%a0 = extractelement <2 x double> %0, i32 0
%conv = fptrunc double %a0 to float		%conv = fptrunc double %a0 to float
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/X86/x86-sse.ll

Show All 27 Lines	;
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = tail call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> %4)		%5 = tail call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> %4)
%6 = extractelement <4 x float> %5, i32 1		%6 = extractelement <4 x float> %5, i32 1
ret float %6		ret float %6
}		}

define float @test_sqrt_ss_0(float %a) {		define float @test_sqrt_ss_0(float %a) {
; CHECK-LABEL: @test_sqrt_ss_0(		; CHECK-LABEL: @test_sqrt_ss_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = call float @llvm.sqrt.f32(float %a)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> [[TMP1]])		; CHECK-NEXT: ret float [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
; CHECK-NEXT: ret float [[TMP3]]
;		;
%1 = insertelement <4 x float> undef, float %a, i32 0		%1 = insertelement <4 x float> undef, float %a, i32 0
%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1		%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2		%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %4)		%5 = tail call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %4)
%6 = extractelement <4 x float> %5, i32 0		%6 = extractelement <4 x float> %5, i32 0
ret float %6		ret float %6
▲ Show 20 Lines • Show All 566 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/X86/x86-sse2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define double @test_sqrt_sd_0(double %a) {			define double @test_sqrt_sd_0(double %a) {
	; CHECK-LABEL: @test_sqrt_sd_0(			; CHECK-LABEL: @test_sqrt_sd_0(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = call double @llvm.sqrt.f64(double %a)
	; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> [[TMP1]])			; CHECK-NEXT: ret double [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: ret double [[TMP3]]
	;			;
	%1 = insertelement <2 x double> undef, double %a, i32 0			%1 = insertelement <2 x double> undef, double %a, i32 0
	%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1			%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
	%3 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)			%3 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)
	%4 = extractelement <2 x double> %3, i32 0			%4 = extractelement <2 x double> %3, i32 0
	ret double %4			ret double %4
	}			}

	▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151536

llvm/trunk/include/llvm/IR/IntrinsicsX86.td

llvm/trunk/lib/IR/AutoUpgrade.cpp

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

llvm/trunk/lib/Target/X86/X86InstrSSE.td

llvm/trunk/lib/Target/X86/X86IntrinsicsInfo.h

llvm/trunk/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics.ll

llvm/trunk/test/CodeGen/X86/fold-load-unops.ll

llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

llvm/trunk/test/CodeGen/X86/sse-intrinsics-x86-upgrade.ll

llvm/trunk/test/CodeGen/X86/sse-intrinsics-x86.ll

llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll

llvm/trunk/test/CodeGen/X86/sse_partial_update.ll

llvm/trunk/test/Transforms/InstCombine/X86/x86-sse.ll

llvm/trunk/test/Transforms/InstCombine/X86/x86-sse2.ll

[X86] Lowering X86 avx512 sqrt intrinsics to IR - LLVM
ClosedPublic