This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
ClosedPublic

Authored by RKSimon on Jul 7 2016, 11:53 AM.

Download Raw Diff

Details

Reviewers

spatel
eli.friedman
andreadb
mkuper
craig.topper

Commits

rG30082a166cab: Merging r275981 and r276740: --------------------------------------------------…
rG0ea8d275cc97: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic…
rL275981: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using…

Summary

D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 63110.Jul 7 2016, 11:53 AM

RKSimon retitled this revision from to [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR.

RKSimon updated this object.

RKSimon added reviewers: eli.friedman, mkuper, craig.topper, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

eli.friedman added inline comments.Jul 7 2016, 3:24 PM

lib/Analysis/ConstantFolding.cpp
1442	This is supposed to give up if it finds an out-of-range float; does that not work correctly?

RKSimon added inline comments.Jul 7 2016, 4:07 PM

lib/Analysis/ConstantFolding.cpp
1442	I was being possibly over thorough - there is scope for some very careful constant folding. The cvt versions rely on the runtime rounding mode, so constant folding may not match the same result (we could maybe just permit exact conversions?). The cvtt versions should work but its not what I saw in some basic tests on godbolt with large values such as FLT_MAX. I'll see if we can improve the existing tests to see what is going on and make a decision on what we can safely support.

eli.friedman added inline comments.Jul 7 2016, 4:40 PM

lib/Analysis/ConstantFolding.cpp
1442	Hmm... for the rounding mode, I think we generally take the position that we don't support changing it. I mean, if we did allow that, we couldn't implement addps on top of the LLVM IR fadd. (We might have to revisit this once LLVM gets proper support for rounding modes.) Exact conversions should be safe either way.

RKSimon mentioned this in rL274846: [X86][SSE] Improve constant folding tests for CVTSD/CVTSS/CVTTSD/CVTTSS.Jul 8 2016, 6:35 AM

Updated the non-truncating conversion constant folding not to accept inexact conversions

LGTM.

lib/Analysis/ConstantFolding.cpp
1682	Indentation?

This revision is now accepted and ready to land.Jul 18 2016, 10:25 AM

Closed by commit rL275981: [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using… (authored by RKSimon). · Explain WhyJul 19 2016, 8:15 AM

This revision was automatically updated to reflect the committed changes.

Thanks Eli - please can you confirm if D22105 is OK as well?

Revision Contents

Path

Size

include/

llvm/

IR/

	IntrinsicsX86.td
	IntrinsicsX86.td (revision 275781)

6 lines

lib/

Analysis/

	ConstantFolding.cpp
	ConstantFolding.cpp (revision 275781)

11 lines

IR/

	AutoUpgrade.cpp
	AutoUpgrade.cpp (revision 275781)

8 lines

Target/

X86/

	X86InstrSSE.td
	X86InstrSSE.td (revision 275781)

31 lines

test/

CodeGen/

X86/

	avx-intrinsics-fast-isel.ll
	avx-intrinsics-fast-isel.ll (revision 275781)

6 lines

	avx-intrinsics-x86-upgrade.ll
	avx-intrinsics-x86-upgrade.ll (revision 275781)

25 lines

	avx-intrinsics-x86.ll
	avx-intrinsics-x86.ll (revision 275781)

37 lines

	sse-intrinsics-fast-isel-x86_64.ll
	sse-intrinsics-fast-isel-x86_64.ll (revision 275781)

11 lines

	sse-intrinsics-fast-isel.ll
	sse-intrinsics-fast-isel.ll (revision 275781)

18 lines

	sse2-intrinsics-fast-isel-x86_64.ll
	sse2-intrinsics-fast-isel-x86_64.ll (revision 275781)

11 lines

	sse2-intrinsics-fast-isel.ll
	sse2-intrinsics-fast-isel.ll (revision 275781)

22 lines

	sse2-intrinsics-x86-upgrade.ll
	sse2-intrinsics-x86-upgrade.ll (revision 275781)

13 lines

	sse2-intrinsics-x86.ll
	sse2-intrinsics-x86.ll (revision 275781)

18 lines

Transforms/

ConstProp/

	calls.ll
	calls.ll (revision 275781)

8 lines

Diff 64317

include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse2_cvtpd2dq : GCCBuiltin<"__builtin_ia32_cvtpd2dq">,		def int_x86_sse2_cvtpd2dq : GCCBuiltin<"__builtin_ia32_cvtpd2dq">,
Intrinsic<[llvm_v4i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvttpd2dq : GCCBuiltin<"__builtin_ia32_cvttpd2dq">,		def int_x86_sse2_cvttpd2dq : GCCBuiltin<"__builtin_ia32_cvttpd2dq">,
Intrinsic<[llvm_v4i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvtpd2ps : GCCBuiltin<"__builtin_ia32_cvtpd2ps">,		def int_x86_sse2_cvtpd2ps : GCCBuiltin<"__builtin_ia32_cvtpd2ps">,
Intrinsic<[llvm_v4f32_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4f32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvtps2dq : GCCBuiltin<"__builtin_ia32_cvtps2dq">,		def int_x86_sse2_cvtps2dq : GCCBuiltin<"__builtin_ia32_cvtps2dq">,
Intrinsic<[llvm_v4i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;
		def int_x86_sse2_cvttps2dq : GCCBuiltin<"__builtin_ia32_cvttps2dq">,
		Intrinsic<[llvm_v4i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;
def int_x86_sse2_cvtsd2si : GCCBuiltin<"__builtin_ia32_cvtsd2si">,		def int_x86_sse2_cvtsd2si : GCCBuiltin<"__builtin_ia32_cvtsd2si">,
Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,		def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
Intrinsic<[llvm_i64_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_i64_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvttsd2si : GCCBuiltin<"__builtin_ia32_cvttsd2si">,		def int_x86_sse2_cvttsd2si : GCCBuiltin<"__builtin_ia32_cvttsd2si">,
Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_sse2_cvttsd2si64 : GCCBuiltin<"__builtin_ia32_cvttsd2si64">,		def int_x86_sse2_cvttsd2si64 : GCCBuiltin<"__builtin_ia32_cvttsd2si64">,
Intrinsic<[llvm_i64_ty], [llvm_v2f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_i64_ty], [llvm_v2f64_ty], [IntrNoMem]>;
▲ Show 20 Lines • Show All 1,017 Lines • ▼ Show 20 Lines
// Vector convert		// Vector convert
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_avx_cvtdq2_ps_256 : GCCBuiltin<"__builtin_ia32_cvtdq2ps256">,		def int_x86_avx_cvtdq2_ps_256 : GCCBuiltin<"__builtin_ia32_cvtdq2ps256">,
Intrinsic<[llvm_v8f32_ty], [llvm_v8i32_ty], [IntrNoMem]>;		Intrinsic<[llvm_v8f32_ty], [llvm_v8i32_ty], [IntrNoMem]>;
def int_x86_avx_cvt_pd2_ps_256 : GCCBuiltin<"__builtin_ia32_cvtpd2ps256">,		def int_x86_avx_cvt_pd2_ps_256 : GCCBuiltin<"__builtin_ia32_cvtpd2ps256">,
Intrinsic<[llvm_v4f32_ty], [llvm_v4f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4f32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
def int_x86_avx_cvt_ps2dq_256 : GCCBuiltin<"__builtin_ia32_cvtps2dq256">,		def int_x86_avx_cvt_ps2dq_256 : GCCBuiltin<"__builtin_ia32_cvtps2dq256">,
Intrinsic<[llvm_v8i32_ty], [llvm_v8f32_ty], [IntrNoMem]>;		Intrinsic<[llvm_v8i32_ty], [llvm_v8f32_ty], [IntrNoMem]>;
		def int_x86_avx_cvtt_pd2dq_256 : GCCBuiltin<"__builtin_ia32_cvttpd2dq256">,
		Intrinsic<[llvm_v4i32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
def int_x86_avx_cvt_pd2dq_256 : GCCBuiltin<"__builtin_ia32_cvtpd2dq256">,		def int_x86_avx_cvt_pd2dq_256 : GCCBuiltin<"__builtin_ia32_cvtpd2dq256">,
Intrinsic<[llvm_v4i32_ty], [llvm_v4f64_ty], [IntrNoMem]>;		Intrinsic<[llvm_v4i32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
		def int_x86_avx_cvtt_ps2dq_256 : GCCBuiltin<"__builtin_ia32_cvttps2dq256">,
		Intrinsic<[llvm_v8i32_ty], [llvm_v8f32_ty], [IntrNoMem]>;
}		}

// Vector bit test		// Vector bit test
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_avx_vtestz_pd : GCCBuiltin<"__builtin_ia32_vtestzpd">,		def int_x86_avx_vtestz_pd : GCCBuiltin<"__builtin_ia32_vtestzpd">,
Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty,		Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty,
llvm_v2f64_ty], [IntrNoMem]>;		llvm_v2f64_ty], [IntrNoMem]>;
def int_x86_avx_vtestc_pd : GCCBuiltin<"__builtin_ia32_vtestcpd">,		def int_x86_avx_vtestc_pd : GCCBuiltin<"__builtin_ia32_vtestcpd">,
▲ Show 20 Lines • Show All 5,910 Lines • Show Last 20 Lines

lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 1,418 Lines • ▼ Show 20 Lines

/// Attempt to fold an SSE floating point to integer conversion of a constant		/// Attempt to fold an SSE floating point to integer conversion of a constant
/// floating point. If roundTowardZero is false, the default IEEE rounding is		/// floating point. If roundTowardZero is false, the default IEEE rounding is
/// used (toward nearest, ties to even). This matches the behavior of the		/// used (toward nearest, ties to even). This matches the behavior of the
/// non-truncating SSE instructions in the default rounding mode. The desired		/// non-truncating SSE instructions in the default rounding mode. The desired
/// integer type Ty is used to select how many bits are available for the		/// integer type Ty is used to select how many bits are available for the
/// result. Returns null if the conversion cannot be performed, otherwise		/// result. Returns null if the conversion cannot be performed, otherwise
/// returns the Constant value resulting from the conversion.		/// returns the Constant value resulting from the conversion.
Constant *ConstantFoldConvertToInt(const APFloat &Val, bool roundTowardZero,		Constant *ConstantFoldSSEConvertToInt(const APFloat &Val, bool roundTowardZero,
Type *Ty) {		Type *Ty) {
// All of these conversion intrinsics form an integer of at most 64bits.		// All of these conversion intrinsics form an integer of at most 64bits.
unsigned ResultWidth = Ty->getIntegerBitWidth();		unsigned ResultWidth = Ty->getIntegerBitWidth();
assert(ResultWidth <= 64 &&		assert(ResultWidth <= 64 &&
"Can only constant fold conversions to 64 and 32 bit ints");		"Can only constant fold conversions to 64 and 32 bit ints");

uint64_t UIntVal;		uint64_t UIntVal;
bool isExact = false;		bool isExact = false;
APFloat::roundingMode mode = roundTowardZero? APFloat::rmTowardZero		APFloat::roundingMode mode = roundTowardZero? APFloat::rmTowardZero
: APFloat::rmNearestTiesToEven;		: APFloat::rmNearestTiesToEven;
APFloat::opStatus status = Val.convertToInteger(&UIntVal, ResultWidth,		APFloat::opStatus status = Val.convertToInteger(&UIntVal, ResultWidth,
/isSigned=/true, mode,		/isSigned=/true, mode,
&isExact);		&isExact);
if (status != APFloat::opOK && status != APFloat::opInexact)		if (status != APFloat::opOK &&
		(!roundTowardZero \|\| status != APFloat::opInexact))
return nullptr;		return nullptr;
eli.friedmanUnsubmitted Not Done Reply Inline Actions This is supposed to give up if it finds an out-of-range float; does that not work correctly? eli.friedman: This is supposed to give up if it finds an out-of-range float; does that not work correctly?
RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I was being possibly over thorough - there is scope for some very careful constant folding. The cvt versions rely on the runtime rounding mode, so constant folding may not match the same result (we could maybe just permit exact conversions?). The cvtt versions should work but its not what I saw in some basic tests on godbolt with large values such as FLT_MAX. I'll see if we can improve the existing tests to see what is going on and make a decision on what we can safely support. RKSimon: I was being possibly over thorough - there is scope for some very careful constant folding.
eli.friedmanUnsubmitted Not Done Reply Inline Actions Hmm... for the rounding mode, I think we generally take the position that we don't support changing it. I mean, if we did allow that, we couldn't implement addps on top of the LLVM IR fadd. (We might have to revisit this once LLVM gets proper support for rounding modes.) Exact conversions should be safe either way. eli.friedman: Hmm... for the rounding mode, I think we generally take the position that we don't support…
return ConstantInt::get(Ty, UIntVal, /isSigned=/true);		return ConstantInt::get(Ty, UIntVal, /isSigned=/true);
}		}

double getValueAsDouble(ConstantFP *Op) {		double getValueAsDouble(ConstantFP *Op) {
Type *Ty = Op->getType();		Type *Ty = Op->getType();

if (Ty->isFloatTy())		if (Ty->isFloatTy())
return Op->getValueAPF().convertToFloat();		return Op->getValueAPF().convertToFloat();
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	if (isa<ConstantVector>(Operands[0]) \|\|
switch (IntrinsicID) {		switch (IntrinsicID) {
default: break;		default: break;
case Intrinsic::x86_sse_cvtss2si:		case Intrinsic::x86_sse_cvtss2si:
case Intrinsic::x86_sse_cvtss2si64:		case Intrinsic::x86_sse_cvtss2si64:
case Intrinsic::x86_sse2_cvtsd2si:		case Intrinsic::x86_sse2_cvtsd2si:
case Intrinsic::x86_sse2_cvtsd2si64:		case Intrinsic::x86_sse2_cvtsd2si64:
if (ConstantFP *FPOp =		if (ConstantFP *FPOp =
dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))		dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
return ConstantFoldConvertToInt(FPOp->getValueAPF(),		return ConstantFoldSSEConvertToInt(FPOp->getValueAPF(),
/roundTowardZero=/false, Ty);		/roundTowardZero=/false, Ty);
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Indentation? eli.friedman: Indentation?
case Intrinsic::x86_sse_cvttss2si:		case Intrinsic::x86_sse_cvttss2si:
case Intrinsic::x86_sse_cvttss2si64:		case Intrinsic::x86_sse_cvttss2si64:
case Intrinsic::x86_sse2_cvttsd2si:		case Intrinsic::x86_sse2_cvttsd2si:
case Intrinsic::x86_sse2_cvttsd2si64:		case Intrinsic::x86_sse2_cvttsd2si64:
if (ConstantFP *FPOp =		if (ConstantFP *FPOp =
dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))		dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
return ConstantFoldConvertToInt(FPOp->getValueAPF(),		return ConstantFoldSSEConvertToInt(FPOp->getValueAPF(),
/roundTowardZero=/true, Ty);		/roundTowardZero=/true, Ty);
}		}
}		}

if (isa<UndefValue>(Operands[0])) {		if (isa<UndefValue>(Operands[0])) {
if (IntrinsicID == Intrinsic::bswap)		if (IntrinsicID == Intrinsic::bswap)
return Operands[0];		return Operands[0];
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	if (IsX86 &&
Name.startswith("sse41.pmovsx") \|\|		Name.startswith("sse41.pmovsx") \|\|
Name.startswith("sse41.pmovzx") \|\|		Name.startswith("sse41.pmovzx") \|\|
Name.startswith("avx2.pmovsx") \|\|		Name.startswith("avx2.pmovsx") \|\|
Name.startswith("avx2.pmovzx") \|\|		Name.startswith("avx2.pmovzx") \|\|
Name == "sse2.cvtdq2pd" \|\|		Name == "sse2.cvtdq2pd" \|\|
Name == "sse2.cvtps2pd" \|\|		Name == "sse2.cvtps2pd" \|\|
Name == "avx.cvtdq2.pd.256" \|\|		Name == "avx.cvtdq2.pd.256" \|\|
Name == "avx.cvt.ps2.pd.256" \|\|		Name == "avx.cvt.ps2.pd.256" \|\|
Name == "sse2.cvttps2dq" \|\|
Name.startswith("avx.cvtt.") \|\|
Name.startswith("avx.vinsertf128.") \|\|		Name.startswith("avx.vinsertf128.") \|\|
Name == "avx2.vinserti128" \|\|		Name == "avx2.vinserti128" \|\|
Name.startswith("avx.vextractf128.") \|\|		Name.startswith("avx.vextractf128.") \|\|
Name == "avx2.vextracti128" \|\|		Name == "avx2.vextracti128" \|\|
Name.startswith("sse4a.movnt.") \|\|		Name.startswith("sse4a.movnt.") \|\|
Name.startswith("avx.movnt.") \|\|		Name.startswith("avx.movnt.") \|\|
Name.startswith("avx512.storent.") \|\|		Name.startswith("avx512.storent.") \|\|
Name == "sse2.storel.dq" \|\|		Name == "sse2.storel.dq" \|\|
▲ Show 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	if (IsX86 && (Name.startswith("sse2.pcmpeq.") \|\|
ShuffleMask);		ShuffleMask);
}		}

bool Int2Double = (StringRef::npos != Name.find("cvtdq2"));		bool Int2Double = (StringRef::npos != Name.find("cvtdq2"));
if (Int2Double)		if (Int2Double)
Rep = Builder.CreateSIToFP(Rep, DstTy, "cvtdq2pd");		Rep = Builder.CreateSIToFP(Rep, DstTy, "cvtdq2pd");
else		else
Rep = Builder.CreateFPExt(Rep, DstTy, "cvtps2pd");		Rep = Builder.CreateFPExt(Rep, DstTy, "cvtps2pd");
} else if (IsX86 && (Name == "sse2.cvttps2dq" \|\|
Name.startswith("avx.cvtt."))) {
// Truncation (round to zero) float/double to i32 vector conversion.
Value *Src = CI->getArgOperand(0);
VectorType *DstTy = cast<VectorType>(CI->getType());
Rep = Builder.CreateFPToSI(Src, DstTy, "cvtt");
} else if (IsX86 && Name.startswith("sse4a.movnt.")) {		} else if (IsX86 && Name.startswith("sse4a.movnt.")) {
Module *M = F->getParent();		Module *M = F->getParent();
SmallVector<Metadata *, 1> Elts;		SmallVector<Metadata *, 1> Elts;
Elts.push_back(		Elts.push_back(
ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(C), 1)));		ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(C), 1)));
MDNode *Node = MDNode::get(C, Elts);		MDNode *Node = MDNode::get(C, Elts);

Value *Arg0 = CI->getArgOperand(0);		Value *Arg0 = CI->getArgOperand(0);
▲ Show 20 Lines • Show All 816 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,003 Lines • ▼ Show 20 Lines	def CVTPD2DQrr : SDI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
"cvtpd2dq\t{$src, $dst\|$dst, $src}",		"cvtpd2dq\t{$src, $dst\|$dst, $src}",
[(set VR128:$dst, (int_x86_sse2_cvtpd2dq VR128:$src))],		[(set VR128:$dst, (int_x86_sse2_cvtpd2dq VR128:$src))],
IIC_SSE_CVT_PD_RR>, Sched<[WriteCvtF2I]>;		IIC_SSE_CVT_PD_RR>, Sched<[WriteCvtF2I]>;

// Convert with truncation packed single/double fp to doubleword		// Convert with truncation packed single/double fp to doubleword
// SSE2 packed instructions with XS prefix		// SSE2 packed instructions with XS prefix
def VCVTTPS2DQrr : VS2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),		def VCVTTPS2DQrr : VS2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RR>, VEX, Sched<[WriteCvtF2I]>;		[(set VR128:$dst,
		(int_x86_sse2_cvttps2dq VR128:$src))],
		IIC_SSE_CVT_PS_RR>, VEX, Sched<[WriteCvtF2I]>;
def VCVTTPS2DQrm : VS2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),		def VCVTTPS2DQrm : VS2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RM>, VEX, Sched<[WriteCvtF2ILd]>;		[(set VR128:$dst, (int_x86_sse2_cvttps2dq
		(loadv4f32 addr:$src)))],
		IIC_SSE_CVT_PS_RM>, VEX, Sched<[WriteCvtF2ILd]>;
def VCVTTPS2DQYrr : VS2SI<0x5B, MRMSrcReg, (outs VR256:$dst), (ins VR256:$src),		def VCVTTPS2DQYrr : VS2SI<0x5B, MRMSrcReg, (outs VR256:$dst), (ins VR256:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RR>, VEX, VEX_L, Sched<[WriteCvtF2I]>;		[(set VR256:$dst,
		(int_x86_avx_cvtt_ps2dq_256 VR256:$src))],
		IIC_SSE_CVT_PS_RR>, VEX, VEX_L, Sched<[WriteCvtF2I]>;
def VCVTTPS2DQYrm : VS2SI<0x5B, MRMSrcMem, (outs VR256:$dst), (ins f256mem:$src),		def VCVTTPS2DQYrm : VS2SI<0x5B, MRMSrcMem, (outs VR256:$dst), (ins f256mem:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RM>, VEX, VEX_L,		[(set VR256:$dst, (int_x86_avx_cvtt_ps2dq_256
		(loadv8f32 addr:$src)))],
		IIC_SSE_CVT_PS_RM>, VEX, VEX_L,
Sched<[WriteCvtF2ILd]>;		Sched<[WriteCvtF2ILd]>;

def CVTTPS2DQrr : S2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),		def CVTTPS2DQrr : S2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RR>, Sched<[WriteCvtF2I]>;		[(set VR128:$dst, (int_x86_sse2_cvttps2dq VR128:$src))],
		IIC_SSE_CVT_PS_RR>, Sched<[WriteCvtF2I]>;
def CVTTPS2DQrm : S2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),		def CVTTPS2DQrm : S2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
"cvttps2dq\t{$src, $dst\|$dst, $src}",		"cvttps2dq\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PS_RM>, Sched<[WriteCvtF2ILd]>;		[(set VR128:$dst,
		(int_x86_sse2_cvttps2dq (memopv4f32 addr:$src)))],
		IIC_SSE_CVT_PS_RM>, Sched<[WriteCvtF2ILd]>;

let Predicates = [HasAVX] in {		let Predicates = [HasAVX] in {
def : Pat<(int_x86_sse2_cvtdq2ps VR128:$src),		def : Pat<(int_x86_sse2_cvtdq2ps VR128:$src),
(VCVTDQ2PSrr VR128:$src)>;		(VCVTDQ2PSrr VR128:$src)>;
def : Pat<(int_x86_sse2_cvtdq2ps (bc_v4i32 (loadv2i64 addr:$src))),		def : Pat<(int_x86_sse2_cvtdq2ps (bc_v4i32 (loadv2i64 addr:$src))),
(VCVTDQ2PSrm addr:$src)>;		(VCVTDQ2PSrm addr:$src)>;
}		}

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	def VCVTTPD2DQXrm : VPDI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
"cvttpd2dqx\t{$src, $dst\|$dst, $src}",		"cvttpd2dqx\t{$src, $dst\|$dst, $src}",
[(set VR128:$dst, (int_x86_sse2_cvttpd2dq		[(set VR128:$dst, (int_x86_sse2_cvttpd2dq
(loadv2f64 addr:$src)))],		(loadv2f64 addr:$src)))],
IIC_SSE_CVT_PD_RM>, VEX, Sched<[WriteCvtF2ILd]>;		IIC_SSE_CVT_PD_RM>, VEX, Sched<[WriteCvtF2ILd]>;

// YMM only		// YMM only
def VCVTTPD2DQYrr : VPDI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins VR256:$src),		def VCVTTPD2DQYrr : VPDI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins VR256:$src),
"cvttpd2dq{y}\t{$src, $dst\|$dst, $src}",		"cvttpd2dq{y}\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PD_RR>, VEX, VEX_L, Sched<[WriteCvtF2I]>;		[(set VR128:$dst,
		(int_x86_avx_cvtt_pd2dq_256 VR256:$src))],
		IIC_SSE_CVT_PD_RR>, VEX, VEX_L, Sched<[WriteCvtF2I]>;
def VCVTTPD2DQYrm : VPDI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins f256mem:$src),		def VCVTTPD2DQYrm : VPDI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins f256mem:$src),
"cvttpd2dq{y}\t{$src, $dst\|$dst, $src}",		"cvttpd2dq{y}\t{$src, $dst\|$dst, $src}",
[], IIC_SSE_CVT_PD_RM>, VEX, VEX_L, Sched<[WriteCvtF2ILd]>;		[(set VR128:$dst,
		(int_x86_avx_cvtt_pd2dq_256 (loadv4f64 addr:$src)))],
		IIC_SSE_CVT_PD_RM>, VEX, VEX_L, Sched<[WriteCvtF2ILd]>;
def : InstAlias<"vcvttpd2dq\t{$src, $dst\|$dst, $src}",		def : InstAlias<"vcvttpd2dq\t{$src, $dst\|$dst, $src}",
(VCVTTPD2DQYrr VR128:$dst, VR256:$src), 0>;		(VCVTTPD2DQYrr VR128:$dst, VR256:$src), 0>;

let Predicates = [HasAVX, NoVLX] in {		let Predicates = [HasAVX, NoVLX] in {
def : Pat<(v4i32 (fp_to_sint (v4f64 VR256:$src))),		def : Pat<(v4i32 (fp_to_sint (v4f64 VR256:$src))),
(VCVTTPD2DQYrr VR256:$src)>;		(VCVTTPD2DQYrr VR256:$src)>;
def : Pat<(v4i32 (fp_to_sint (loadv4f64 addr:$src))),		def : Pat<(v4i32 (fp_to_sint (loadv4f64 addr:$src))),
(VCVTTPD2DQYrm addr:$src)>;		(VCVTTPD2DQYrm addr:$src)>;
▲ Show 20 Lines • Show All 6,728 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 675 Lines • ▼ Show 20 Lines
	; X32-NEXT: vzeroupper			; X32-NEXT: vzeroupper
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm256_cvttpd_epi32:			; X64-LABEL: test_mm256_cvttpd_epi32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vcvttpd2dqy %ymm0, %xmm0			; X64-NEXT: vcvttpd2dqy %ymm0, %xmm0
	; X64-NEXT: vzeroupper			; X64-NEXT: vzeroupper
	; X64-NEXT: retq			; X64-NEXT: retq
	%cvt = fptosi <4 x double> %a0 to <4 x i32>			%cvt = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %a0)
	%res = bitcast <4 x i32> %cvt to <2 x i64>			%res = bitcast <4 x i32> %cvt to <2 x i64>
	ret <2 x i64> %res			ret <2 x i64> %res
	}			}
				declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind readnone

	define <4 x i64> @test_mm256_cvttps_epi32(<8 x float> %a0) nounwind {			define <4 x i64> @test_mm256_cvttps_epi32(<8 x float> %a0) nounwind {
	; X32-LABEL: test_mm256_cvttps_epi32:			; X32-LABEL: test_mm256_cvttps_epi32:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: vcvttps2dq %ymm0, %ymm0			; X32-NEXT: vcvttps2dq %ymm0, %ymm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm256_cvttps_epi32:			; X64-LABEL: test_mm256_cvttps_epi32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vcvttps2dq %ymm0, %ymm0			; X64-NEXT: vcvttps2dq %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%cvt = fptosi <8 x float> %a0 to <8 x i32>			%cvt = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0)
	%res = bitcast <8 x i32> %cvt to <4 x i64>			%res = bitcast <8 x i32> %cvt to <4 x i64>
	ret <4 x i64> %res			ret <4 x i64> %res
	}			}
				declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind readnone

	define <4 x double> @test_mm256_div_pd(<4 x double> %a0, <4 x double> %a1) nounwind {			define <4 x double> @test_mm256_div_pd(<4 x double> %a0, <4 x double> %a1) nounwind {
	; X32-LABEL: test_mm256_div_pd:			; X32-LABEL: test_mm256_div_pd:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: vdivpd %ymm1, %ymm0, %ymm0			; X32-NEXT: vdivpd %ymm1, %ymm0, %ymm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm256_div_pd:			; X64-LABEL: test_mm256_div_pd:
	▲ Show 20 Lines • Show All 3,068 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vcvtps2pd %xmm0, %ymm0			; CHECK-NEXT: vcvtps2pd %xmm0, %ymm0
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%res = call <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float> %a0) ; <<4 x double>> [#uses=1]			%res = call <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float> %a0) ; <<4 x double>> [#uses=1]
	ret <4 x double> %res			ret <4 x double> %res
	}			}
	declare <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float>) nounwind readnone			declare <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float>) nounwind readnone


	define <4 x i32> @test_x86_avx_cvtt_pd2dq_256(<4 x double> %a0) {
	; CHECK-LABEL: test_x86_avx_cvtt_pd2dq_256:
	; CHECK: ## BB#0:
	; CHECK-NEXT: vcvttpd2dqy %ymm0, %xmm0
	; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retl
	%res = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %a0) ; <<4 x i32>> [#uses=1]
	ret <4 x i32> %res
	}
	declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind readnone


	define <8 x i32> @test_x86_avx_cvtt_ps2dq_256(<8 x float> %a0) {
	; CHECK-LABEL: test_x86_avx_cvtt_ps2dq_256:
	; CHECK: ## BB#0:
	; CHECK-NEXT: vcvttps2dq %ymm0, %ymm0
	; CHECK-NEXT: retl
	%res = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0) ; <<8 x i32>> [#uses=1]
	ret <8 x i32> %res
	}
	declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind readnone


	define void @test_x86_sse2_storeu_dq(i8* %a0, <16 x i8> %a1) {			define void @test_x86_sse2_storeu_dq(i8* %a0, <16 x i8> %a1) {
	; add operation forces the execution domain.			; add operation forces the execution domain.
	; CHECK-LABEL: test_x86_sse2_storeu_dq:			; CHECK-LABEL: test_x86_sse2_storeu_dq:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: vpaddb LCPI34_0, %xmm0, %xmm0			; CHECK-NEXT: vpaddb LCPI32_0, %xmm0, %xmm0
	; CHECK-NEXT: vmovdqu %xmm0, (%eax)			; CHECK-NEXT: vmovdqu %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	call void @llvm.x86.sse2.storeu.dq(i8* %a0, <16 x i8> %a2)			call void @llvm.x86.sse2.storeu.dq(i8* %a0, <16 x i8> %a2)
	ret void			ret void
	}			}
	declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind			declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind

	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-intrinsics-x86.ll

Show First 20 Lines • Show All 3,425 Lines • ▼ Show 20 Lines
; AVX512VL-NEXT: vcvtdq2ps %ymm0, %ymm0		; AVX512VL-NEXT: vcvtdq2ps %ymm0, %ymm0
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32> %a0) ; <<8 x float>> [#uses=1]		%res = call <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32> %a0) ; <<8 x float>> [#uses=1]
ret <8 x float> %res		ret <8 x float> %res
}		}
declare <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32>) nounwind readnone		declare <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32>) nounwind readnone


		define <4 x i32> @test_x86_avx_cvtt_pd2dq_256(<4 x double> %a0) {
		; AVX-LABEL: test_x86_avx_cvtt_pd2dq_256:
		; AVX: ## BB#0:
		; AVX-NEXT: vcvttpd2dqy %ymm0, %xmm0
		; AVX-NEXT: vzeroupper
		; AVX-NEXT: retl
		;
		; AVX512VL-LABEL: test_x86_avx_cvtt_pd2dq_256:
		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: vcvttpd2dqy %ymm0, %xmm0
		; AVX512VL-NEXT: retl
		%res = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %a0) ; <<4 x i32>> [#uses=1]
		ret <4 x i32> %res
		}
		declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind readnone


		define <8 x i32> @test_x86_avx_cvtt_ps2dq_256(<8 x float> %a0) {
		; AVX-LABEL: test_x86_avx_cvtt_ps2dq_256:
		; AVX: ## BB#0:
		; AVX-NEXT: vcvttps2dq %ymm0, %ymm0
		; AVX-NEXT: retl
		;
		; AVX512VL-LABEL: test_x86_avx_cvtt_ps2dq_256:
		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: vcvttps2dq %ymm0, %ymm0
		; AVX512VL-NEXT: retl
		%res = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0) ; <<8 x i32>> [#uses=1]
		ret <8 x i32> %res
		}
		declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind readnone


define <8 x float> @test_x86_avx_dp_ps_256(<8 x float> %a0, <8 x float> %a1) {		define <8 x float> @test_x86_avx_dp_ps_256(<8 x float> %a0, <8 x float> %a1) {
; AVX-LABEL: test_x86_avx_dp_ps_256:		; AVX-LABEL: test_x86_avx_dp_ps_256:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0		; AVX-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_avx_dp_ps_256:		; AVX512VL-LABEL: test_x86_avx_dp_ps_256:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
▲ Show 20 Lines • Show All 1,105 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret i32 %tmp		ret i32 %tmp
}		}
declare i32 @llvm.x86.sse42.crc32.32.32(i32, i32) nounwind		declare i32 @llvm.x86.sse42.crc32.32.32(i32, i32) nounwind

define void @movnt_dq(i8* %p, <2 x i64> %a1) nounwind {		define void @movnt_dq(i8* %p, <2 x i64> %a1) nounwind {
; AVX-LABEL: movnt_dq:		; AVX-LABEL: movnt_dq:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: movl {{[0-9]+}}(%esp), %eax		; AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; AVX-NEXT: vpaddq LCPI254_0, %xmm0, %xmm0		; AVX-NEXT: vpaddq LCPI256_0, %xmm0, %xmm0
; AVX-NEXT: vmovntdq %ymm0, (%eax)		; AVX-NEXT: vmovntdq %ymm0, (%eax)
; AVX-NEXT: vzeroupper		; AVX-NEXT: vzeroupper
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: movnt_dq:		; AVX512VL-LABEL: movnt_dq:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
; AVX512VL-NEXT: movl {{[0-9]+}}(%esp), %eax		; AVX512VL-NEXT: movl {{[0-9]+}}(%esp), %eax
; AVX512VL-NEXT: vpaddq LCPI254_0, %xmm0, %xmm0		; AVX512VL-NEXT: vpaddq LCPI256_0, %xmm0, %xmm0
; AVX512VL-NEXT: vmovntdq %ymm0, (%eax)		; AVX512VL-NEXT: vmovntdq %ymm0, (%eax)
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%a2 = add <2 x i64> %a1, <i64 1, i64 1>		%a2 = add <2 x i64> %a1, <i64 1, i64 1>
%a3 = shufflevector <2 x i64> %a2, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%a3 = shufflevector <2 x i64> %a2, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
tail call void @llvm.x86.avx.movnt.dq.256(i8* %p, <4 x i64> %a3) nounwind		tail call void @llvm.x86.avx.movnt.dq.256(i8* %p, <4 x i64> %a3) nounwind
ret void		ret void
}		}
declare void @llvm.x86.avx.movnt.dq.256(i8*, <4 x i64>) nounwind		declare void @llvm.x86.avx.movnt.dq.256(i8*, <4 x i64>) nounwind
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -fast-isel -mtriple=x86_64-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -fast-isel -mtriple=x86_64-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X64

	; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse-builtins.c			; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse-builtins.c

	define <4 x float> @test_mm_cvtsi64_ss(<4 x float> %a0, i64 %a1) nounwind {			define <4 x float> @test_mm_cvtsi64_ss(<4 x float> %a0, i64 %a1) nounwind {
	; X64-LABEL: test_mm_cvtsi64_ss:			; X64-LABEL: test_mm_cvtsi64_ss:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvtsi2ssq %rdi, %xmm1			; X64-NEXT: cvtsi2ssq %rdi, %xmm0
	; X64-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; X64-NEXT: retq			; X64-NEXT: retq
	%cvt = sitofp i64 %a1 to float			%res = call <4 x float> @llvm.x86.sse.cvtsi642ss(<4 x float> %a0, i64 %a1)
	%res = insertelement <4 x float> %a0, float %cvt, i32 0
	ret <4 x float> %res			ret <4 x float> %res
	}			}
				declare <4 x float> @llvm.x86.sse.cvtsi642ss(<4 x float>, i64) nounwind readnone

	define i64 @test_mm_cvtss_si64(<4 x float> %a0) nounwind {			define i64 @test_mm_cvtss_si64(<4 x float> %a0) nounwind {
	; X64-LABEL: test_mm_cvtss_si64:			; X64-LABEL: test_mm_cvtss_si64:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvtss2si %xmm0, %rax			; X64-NEXT: cvtss2si %xmm0, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = call i64 @llvm.x86.sse.cvtss2si64(<4 x float> %a0)			%res = call i64 @llvm.x86.sse.cvtss2si64(<4 x float> %a0)
	ret i64 %res			ret i64 %res
	}			}
	declare i64 @llvm.x86.sse.cvtss2si64(<4 x float>) nounwind readnone			declare i64 @llvm.x86.sse.cvtss2si64(<4 x float>) nounwind readnone

	define i64 @test_mm_cvttss_si64(<4 x float> %a0) nounwind {			define i64 @test_mm_cvttss_si64(<4 x float> %a0) nounwind {
	; X64-LABEL: test_mm_cvttss_si64:			; X64-LABEL: test_mm_cvttss_si64:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvttss2si %xmm0, %rax			; X64-NEXT: cvttss2si %xmm0, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%cvt = extractelement <4 x float> %a0, i32 0			%res = call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %a0)
	%res = fptosi float %cvt to i64
	ret i64 %res			ret i64 %res
	}			}
				declare i64 @llvm.x86.sse.cvttss2si64(<4 x float>) nounwind readnone

test/CodeGen/X86/sse-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.cvtss2si(<4 x float> %a0)		%res = call i32 @llvm.x86.sse.cvtss2si(<4 x float> %a0)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.cvtss2si(<4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.cvtss2si(<4 x float>) nounwind readnone

define <4 x float> @test_mm_cvtsi32_ss(<4 x float> %a0, i32 %a1) nounwind {		define <4 x float> @test_mm_cvtsi32_ss(<4 x float> %a0, i32 %a1) nounwind {
; X32-LABEL: test_mm_cvtsi32_ss:		; X32-LABEL: test_mm_cvtsi32_ss:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: cvtsi2ssl {{[0-9]+}}(%esp), %xmm0
; X32-NEXT: cvtsi2ssl %eax, %xmm1
; X32-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_cvtsi32_ss:		; X64-LABEL: test_mm_cvtsi32_ss:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: cvtsi2ssl %edi, %xmm1		; X64-NEXT: cvtsi2ssl %edi, %xmm0
; X64-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
; X64-NEXT: retq		; X64-NEXT: retq
%cvt = sitofp i32 %a1 to float		%res = call <4 x float> @llvm.x86.sse.cvtsi2ss(<4 x float> %a0, i32 %a1)
%res = insertelement <4 x float> %a0, float %cvt, i32 0
ret <4 x float> %res		ret <4 x float> %res
}		}
		declare <4 x float> @llvm.x86.sse.cvtsi2ss(<4 x float>, i32) nounwind readnone

define float @test_mm_cvtss_f32(<4 x float> %a0) nounwind {		define float @test_mm_cvtss_f32(<4 x float> %a0) nounwind {
; X32-LABEL: test_mm_cvtss_f32:		; X32-LABEL: test_mm_cvtss_f32:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: movss %xmm0, (%esp)		; X32-NEXT: movss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
Show All 25 Lines
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: cvttss2si %xmm0, %eax		; X32-NEXT: cvttss2si %xmm0, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_cvttss_si:		; X64-LABEL: test_mm_cvttss_si:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: cvttss2si %xmm0, %eax		; X64-NEXT: cvttss2si %xmm0, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%cvt = extractelement <4 x float> %a0, i32 0		%res = call i32 @llvm.x86.sse.cvttss2si(<4 x float> %a0)
%res = fptosi float %cvt to i32
ret i32 %res		ret i32 %res
}		}
		declare i32 @llvm.x86.sse.cvttss2si(<4 x float>) nounwind readnone

define i32 @test_mm_cvttss_si32(<4 x float> %a0) nounwind {		define i32 @test_mm_cvttss_si32(<4 x float> %a0) nounwind {
; X32-LABEL: test_mm_cvttss_si32:		; X32-LABEL: test_mm_cvttss_si32:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: cvttss2si %xmm0, %eax		; X32-NEXT: cvttss2si %xmm0, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_cvttss_si32:		; X64-LABEL: test_mm_cvttss_si32:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: cvttss2si %xmm0, %eax		; X64-NEXT: cvttss2si %xmm0, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%cvt = extractelement <4 x float> %a0, i32 0		%res = call i32 @llvm.x86.sse.cvttss2si(<4 x float> %a0)
%res = fptosi float %cvt to i32
ret i32 %res		ret i32 %res
}		}

define <4 x float> @test_mm_div_ps(<4 x float> %a0, <4 x float> %a1) nounwind {		define <4 x float> @test_mm_div_ps(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_div_ps:		; X32-LABEL: test_mm_div_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: divps %xmm1, %xmm0		; X32-NEXT: divps %xmm1, %xmm0
; X32-NEXT: retl		; X32-NEXT: retl
▲ Show 20 Lines • Show All 1,514 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll

	Show All 19 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = extractelement <2 x i64> %a0, i32 0			%res = extractelement <2 x i64> %a0, i32 0
	ret i64 %res			ret i64 %res
	}			}

	define <2 x double> @test_mm_cvtsi64_sd(<2 x double> %a0, i64 %a1) nounwind {			define <2 x double> @test_mm_cvtsi64_sd(<2 x double> %a0, i64 %a1) nounwind {
	; X64-LABEL: test_mm_cvtsi64_sd:			; X64-LABEL: test_mm_cvtsi64_sd:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvtsi2sdq %rdi, %xmm1			; X64-NEXT: cvtsi2sdq %rdi, %xmm0
	; X64-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; X64-NEXT: retq			; X64-NEXT: retq
	%cvt = sitofp i64 %a1 to double			%res = call <2 x double> @llvm.x86.sse2.cvtsi642sd(<2 x double> %a0, i64 %a1)
	%res = insertelement <2 x double> %a0, double %cvt, i32 0
	ret <2 x double> %res			ret <2 x double> %res
	}			}
				declare <2 x double> @llvm.x86.sse2.cvtsi642sd(<2 x double>, i64) nounwind readnone

	define <2 x i64> @test_mm_cvtsi64_si128(i64 %a0) nounwind {			define <2 x i64> @test_mm_cvtsi64_si128(i64 %a0) nounwind {
	; X64-LABEL: test_mm_cvtsi64_si128:			; X64-LABEL: test_mm_cvtsi64_si128:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movd %rdi, %xmm0			; X64-NEXT: movd %rdi, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%res0 = insertelement <2 x i64> undef, i64 %a0, i32 0			%res0 = insertelement <2 x i64> undef, i64 %a0, i32 0
	%res1 = insertelement <2 x i64> %res0, i64 0, i32 1			%res1 = insertelement <2 x i64> %res0, i64 0, i32 1
	ret <2 x i64> %res1			ret <2 x i64> %res1
	}			}

	define i64 @test_mm_cvttsd_si64(<2 x double> %a0) nounwind {			define i64 @test_mm_cvttsd_si64(<2 x double> %a0) nounwind {
	; X64-LABEL: test_mm_cvttsd_si64:			; X64-LABEL: test_mm_cvttsd_si64:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvttsd2si %xmm0, %rax			; X64-NEXT: cvttsd2si %xmm0, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%ext = extractelement <2 x double> %a0, i32 0			%res = call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %a0)
	%res = fptosi double %ext to i64
	ret i64 %res			ret i64 %res
	}			}
				declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>) nounwind readnone

	define <2 x i64> @test_mm_loadu_si64(i64* %a0) nounwind {			define <2 x i64> @test_mm_loadu_si64(i64* %a0) nounwind {
	; X64-LABEL: test_mm_loadu_si64:			; X64-LABEL: test_mm_loadu_si64:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: retq			; X64-NEXT: retq
	%ld = load i64, i64* %a0, align 1			%ld = load i64, i64* %a0, align 1
	%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0			%res0 = insertelement <2 x i64> undef, i64 %ld, i32 0
	Show All 14 Lines

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 1,202 Lines • ▼ Show 20 Lines
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvtsd2si %xmm0, %eax			; X64-NEXT: cvtsd2si %xmm0, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> %a0)			%res = call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> %a0)
	ret i32 %res			ret i32 %res
	}			}
	declare i32 @llvm.x86.sse2.cvtsd2si(<2 x double>) nounwind readnone			declare i32 @llvm.x86.sse2.cvtsd2si(<2 x double>) nounwind readnone

				define <4 x float> @test_mm_cvtsd_ss(<4 x float> %a0, <2 x double> %a1) {
				; X32-LABEL: test_mm_cvtsd_ss:
				; X32: # BB#0:
				; X32-NEXT: cvtsd2ss %xmm1, %xmm0
				; X32-NEXT: retl
				;
				; X64-LABEL: test_mm_cvtsd_ss:
				; X64: # BB#0:
				; X64-NEXT: cvtsd2ss %xmm1, %xmm0
				; X64-NEXT: retq
				%res = call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a0, <2 x double> %a1)
				ret <4 x float> %res
				}
				declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>) nounwind readnone

	define i32 @test_mm_cvtsi128_si32(<2 x i64> %a0) nounwind {			define i32 @test_mm_cvtsi128_si32(<2 x i64> %a0) nounwind {
	; X32-LABEL: test_mm_cvtsi128_si32:			; X32-LABEL: test_mm_cvtsi128_si32:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movd %xmm0, %eax			; X32-NEXT: movd %xmm0, %eax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm_cvtsi128_si32:			; X64-LABEL: test_mm_cvtsi128_si32:
	; X64: # BB#0:			; X64: # BB#0:
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: cvttps2dq %xmm0, %xmm0			; X32-NEXT: cvttps2dq %xmm0, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm_cvttps_epi32:			; X64-LABEL: test_mm_cvttps_epi32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvttps2dq %xmm0, %xmm0			; X64-NEXT: cvttps2dq %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = fptosi <4 x float> %a0 to <4 x i32>			%res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0)
	%bc = bitcast <4 x i32> %res to <2 x i64>			%bc = bitcast <4 x i32> %res to <2 x i64>
	ret <2 x i64> %bc			ret <2 x i64> %bc
	}			}
				declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind readnone

	define i32 @test_mm_cvttsd_si32(<2 x double> %a0) nounwind {			define i32 @test_mm_cvttsd_si32(<2 x double> %a0) nounwind {
	; X32-LABEL: test_mm_cvttsd_si32:			; X32-LABEL: test_mm_cvttsd_si32:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: cvttsd2si %xmm0, %eax			; X32-NEXT: cvttsd2si %xmm0, %eax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm_cvttsd_si32:			; X64-LABEL: test_mm_cvttsd_si32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: cvttsd2si %xmm0, %eax			; X64-NEXT: cvttsd2si %xmm0, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%ext = extractelement <2 x double> %a0, i32 0			%res = call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %a0)
	%res = fptosi double %ext to i32
	ret i32 %res			ret i32 %res
	}			}
				declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>) nounwind readnone

	define <2 x double> @test_mm_div_pd(<2 x double> %a0, <2 x double> %a1) nounwind {			define <2 x double> @test_mm_div_pd(<2 x double> %a0, <2 x double> %a1) nounwind {
	; X32-LABEL: test_mm_div_pd:			; X32-LABEL: test_mm_div_pd:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: divpd %xmm1, %xmm0			; X32-NEXT: divpd %xmm1, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_mm_div_pd:			; X64-LABEL: test_mm_div_pd:
	▲ Show 20 Lines • Show All 2,517 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cvtps2pd %xmm0, %xmm0			; CHECK-NEXT: cvtps2pd %xmm0, %xmm0
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%res = call <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float> %a0) ; <<2 x double>> [#uses=1]			%res = call <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float> %a0) ; <<2 x double>> [#uses=1]
	ret <2 x double> %res			ret <2 x double> %res
	}			}
	declare <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float>) nounwind readnone			declare <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float>) nounwind readnone


	define <4 x i32> @test_x86_sse2_cvttps2dq(<4 x float> %a0) {
	; CHECK-LABEL: test_x86_sse2_cvttps2dq:
	; CHECK: ## BB#0:
	; CHECK-NEXT: cvttps2dq %xmm0, %xmm0
	; CHECK-NEXT: retl
	%res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0) ; <<4 x i32>> [#uses=1]
	ret <4 x i32> %res
	}
	declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind readnone


	define void @test_x86_sse2_storel_dq(i8* %a0, <4 x i32> %a1) {			define void @test_x86_sse2_storel_dq(i8* %a0, <4 x i32> %a1) {
	; CHECK-LABEL: test_x86_sse2_storel_dq:			; CHECK-LABEL: test_x86_sse2_storel_dq:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movlps %xmm0, (%eax)			; CHECK-NEXT: movlps %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	call void @llvm.x86.sse2.storel.dq(i8* %a0, <4 x i32> %a1)			call void @llvm.x86.sse2.storel.dq(i8* %a0, <4 x i32> %a1)
	ret void			ret void
	}			}
	declare void @llvm.x86.sse2.storel.dq(i8*, <4 x i32>) nounwind			declare void @llvm.x86.sse2.storel.dq(i8*, <4 x i32>) nounwind


	define void @test_x86_sse2_storeu_dq(i8* %a0, <16 x i8> %a1) {			define void @test_x86_sse2_storeu_dq(i8* %a0, <16 x i8> %a1) {
	; add operation forces the execution domain.			; add operation forces the execution domain.
	; CHECK-LABEL: test_x86_sse2_storeu_dq:			; CHECK-LABEL: test_x86_sse2_storeu_dq:
	; CHECK: ## BB#0:			; CHECK: ## BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: paddb LCPI8_0, %xmm0			; CHECK-NEXT: paddb LCPI7_0, %xmm0
	; CHECK-NEXT: movdqu %xmm0, (%eax)			; CHECK-NEXT: movdqu %xmm0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	call void @llvm.x86.sse2.storeu.dq(i8* %a0, <16 x i8> %a2)			call void @llvm.x86.sse2.storeu.dq(i8* %a0, <16 x i8> %a2)
	ret void			ret void
	}			}
	declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind			declare void @llvm.x86.sse2.storeu.dq(i8*, <16 x i8>) nounwind

	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-intrinsics-x86.ll

	; NOTE: Assertions have been autogenerated by update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse2 \| FileCheck %s --check-prefix=SSE			; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse2 \| FileCheck %s --check-prefix=SSE
	; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl \| FileCheck %s --check-prefix=KNL			; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl \| FileCheck %s --check-prefix=KNL

	define <2 x double> @test_x86_sse2_add_sd(<2 x double> %a0, <2 x double> %a1) {			define <2 x double> @test_x86_sse2_add_sd(<2 x double> %a0, <2 x double> %a1) {
	; SSE-LABEL: test_x86_sse2_add_sd:			; SSE-LABEL: test_x86_sse2_add_sd:
	; SSE: ## BB#0:			; SSE: ## BB#0:
	; SSE-NEXT: addsd %xmm1, %xmm0			; SSE-NEXT: addsd %xmm1, %xmm0
	; SSE-NEXT: retl			; SSE-NEXT: retl
	▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
	; KNL-NEXT: vcvttpd2dq %xmm0, %xmm0			; KNL-NEXT: vcvttpd2dq %xmm0, %xmm0
	; KNL-NEXT: retl			; KNL-NEXT: retl
	%res = call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %a0) ; <<4 x i32>> [#uses=1]			%res = call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %a0) ; <<4 x i32>> [#uses=1]
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}
	declare <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double>) nounwind readnone			declare <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double>) nounwind readnone


				define <4 x i32> @test_x86_sse2_cvttps2dq(<4 x float> %a0) {
				; SSE-LABEL: test_x86_sse2_cvttps2dq:
				; SSE: ## BB#0:
				; SSE-NEXT: cvttps2dq %xmm0, %xmm0
				; SSE-NEXT: retl
				;
				; KNL-LABEL: test_x86_sse2_cvttps2dq:
				; KNL: ## BB#0:
				; KNL-NEXT: vcvttps2dq %xmm0, %xmm0
				; KNL-NEXT: retl
				%res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0) ; <<4 x i32>> [#uses=1]
				ret <4 x i32> %res
				}
				declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind readnone


	define i32 @test_x86_sse2_cvttsd2si(<2 x double> %a0) {			define i32 @test_x86_sse2_cvttsd2si(<2 x double> %a0) {
	; SSE-LABEL: test_x86_sse2_cvttsd2si:			; SSE-LABEL: test_x86_sse2_cvttsd2si:
	; SSE: ## BB#0:			; SSE: ## BB#0:
	; SSE-NEXT: cvttsd2si %xmm0, %eax			; SSE-NEXT: cvttsd2si %xmm0, %eax
	; SSE-NEXT: retl			; SSE-NEXT: retl
	;			;
	; KNL-LABEL: test_x86_sse2_cvttsd2si:			; KNL-LABEL: test_x86_sse2_cvttsd2si:
	; KNL: ## BB#0:			; KNL: ## BB#0:
	▲ Show 20 Lines • Show All 936 Lines • Show Last 20 Lines

test/Transforms/ConstProp/calls.ll

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	entry:
%sum02 = add i32 %i0, %i2		%sum02 = add i32 %i0, %i2
%sum13 = add i64 %i1, %i3		%sum13 = add i64 %i1, %i3
%cmp02 = icmp eq i32 %sum02, 10		%cmp02 = icmp eq i32 %sum02, 10
%cmp13 = icmp eq i64 %sum13, 10		%cmp13 = icmp eq i64 %sum13, 10
%b = and i1 %cmp02, %cmp13		%b = and i1 %cmp02, %cmp13
ret i1 %b		ret i1 %b
}		}

; TODO: Inexact values should not fold as they are dependent on rounding mode		; Inexact values should not fold as they are dependent on rounding mode
define i1 @test_sse_cvts_inexact() nounwind readnone {		define i1 @test_sse_cvts_inexact() nounwind readnone {
; CHECK-LABEL: @test_sse_cvts_inexact(		; CHECK-LABEL: @test_sse_cvts_inexact(
; CHECK-NOT: call		; CHECK: call
; CHECK: ret i1 true		; CHECK: call
		; CHECK: call
		; CHECK: call
entry:		entry:
%i0 = tail call i32 @llvm.x86.sse.cvtss2si(<4 x float> <float 1.75, float undef, float undef, float undef>) nounwind		%i0 = tail call i32 @llvm.x86.sse.cvtss2si(<4 x float> <float 1.75, float undef, float undef, float undef>) nounwind
%i1 = tail call i64 @llvm.x86.sse.cvtss2si64(<4 x float> <float 1.75, float undef, float undef, float undef>) nounwind		%i1 = tail call i64 @llvm.x86.sse.cvtss2si64(<4 x float> <float 1.75, float undef, float undef, float undef>) nounwind
%i2 = call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> <double 1.75, double undef>) nounwind		%i2 = call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> <double 1.75, double undef>) nounwind
%i3 = call i64 @llvm.x86.sse2.cvtsd2si64(<2 x double> <double 1.75, double undef>) nounwind		%i3 = call i64 @llvm.x86.sse2.cvtsd2si64(<2 x double> <double 1.75, double undef>) nounwind
%sum02 = add i32 %i0, %i2		%sum02 = add i32 %i0, %i2
%sum13 = add i64 %i1, %i3		%sum13 = add i64 %i1, %i3
%cmp02 = icmp eq i32 %sum02, 4		%cmp02 = icmp eq i32 %sum02, 4
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 64317

include/llvm/IR/IntrinsicsX86.td

lib/Analysis/ConstantFolding.cpp

lib/IR/AutoUpgrade.cpp

lib/Target/X86/X86InstrSSE.td

test/CodeGen/X86/avx-intrinsics-fast-isel.ll

test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll

test/CodeGen/X86/avx-intrinsics-x86.ll

test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll

test/CodeGen/X86/sse-intrinsics-fast-isel.ll

test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll

test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

test/CodeGen/X86/sse2-intrinsics-x86.ll

test/Transforms/ConstProp/calls.ll

[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
ClosedPublic