This is an archive of the discontinued LLVM Phabricator instance.

[X86] Implement smarter instruction lowering for FP_TO_UINT from vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
ClosedPublic

Authored by RKSimon on Oct 19 2020, 6:36 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
andrew.w.kaylor
yubing
pengfei
lebedev.ri
TomHender

Commits

rGee71c1bbccb1: [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to…

Summary

We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs. We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

TomHender created this revision.Oct 19 2020, 6:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 6:36 AM

Herald added subscribers: llvm-commits, pengfei, dexonsmith, hiraditya. · View Herald Transcript

TomHender requested review of this revision.Oct 19 2020, 6:36 AM

TomHender retitled this revision from * [x86] Implement smarter instruction lowering for FP_TO_UINT for vXf32 to vXi32 from SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. to * [x86] Implement smarter instruction lowering for FP_TO_UINT from vXf32 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction..Oct 19 2020, 6:39 AM

RKSimon added a reviewer: spatel.Oct 19 2020, 7:07 AM

For the costs - ideally you need to run the code snippet through llvm-mca (you can do this in godbolt) for various cpus of that level (e.g. avx1 -> sandybridge/btver2/bdver2, avx2 -> znver2/haswell etc.) and use the worst case throughput cost of those runs. For older targets (sse2...) we're more limited on testable cpu targets, I tend to just use slm's costs as they tend to match weak pre-avx cpus).

llvm/lib/Target/X86/X86ISelLowering.cpp
21412	Avoid operator order warnings: if ((VT == MVT::v4i32 && SrcVT == MVT::v4f32) \|\| (VT == MVT::v8i32 && SrcVT == MVT::v8f32))
21437	clang format this block

Harbormaster completed remote builds in B75530: Diff 299025.Oct 19 2020, 7:40 AM

@RKSimon Thank you for your very speedy response.

Should I change the other costs that seem wrong to me? Git blame suggests that they are from 2014 and I think they are just way outdated.

TomHender marked 2 inline comments as done.Oct 19 2020, 7:45 AM

@RKSimon I ran it through llvm-mca now. It gives me a reciprocal throughput of 3.5 for Silvermont and 3 for Haswell for the new instruction sequence.

The problem I see with these values is that they seem incorrect relative to the others. Many values are off by a large factor from what I get when I run them through llvm-mca.
Like { ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 } for SSE2 for example. I don't understand how anyone came up with an astronomic cost of 160. @RKSimon's method gives a value of 3 for me (Godbolt-Link). Even exploring the generated machine code for ancient LLVM versions or the Agner Fog instruction table for VIA Nano 3000 cannot explain this magnitude.

I must be misunderstanding something still.

dexonsmith removed a subscriber: dexonsmith.Oct 19 2020, 5:32 PM

Gentle ping.

I am still unsure about the cost model thing. Is it possible to get any information about this? Is the policy to just not worry and ignore the other entries?

RKSimon added reviewers: yubing, pengfei.Jan 3 2021, 3:58 AM

No regression appeared in our internal testcases.
It seems the transform is correct, have you verified it with alive-tv?

In D89697#2483676, @yubing wrote:

No regression appeared in our internal testcases.
It seems the transform is correct, have you verified it with alive-tv?

I was curious to see if I could model it:
https://alive2.llvm.org/ce/z/RXcYY9

Converting #x4f800000 (4294967296) to uint32_t is poison, not 0 though. Am I reading the Alive output correctly? (cc @lebedev.ri @aqjune @nlopes @nikic )

In D89697#2484919, @spatel wrote:

In D89697#2483676, @yubing wrote:

No regression appeared in our internal testcases.
It seems the transform is correct, have you verified it with alive-tv?

I was curious to see if I could model it:
https://alive2.llvm.org/ce/z/RXcYY9

Converting #x4f800000 (4294967296) to uint32_t is poison, not 0 though. Am I reading the Alive output correctly? (cc @lebedev.ri @aqjune @nlopes @nikic )

That is how i would read that alive2 report, yes.
Perhaps one needs https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic ?

In D89697#2484919, @spatel wrote:

In D89697#2483676, @yubing wrote:

No regression appeared in our internal testcases.
It seems the transform is correct, have you verified it with alive-tv?

I was curious to see if I could model it:
https://alive2.llvm.org/ce/z/RXcYY9

Converting #x4f800000 (4294967296) to uint32_t is poison, not 0 though. Am I reading the Alive output correctly? (cc @lebedev.ri @aqjune @nlopes @nikic )

That's a nice bug in Alive 😅
It's because UINT_MAX is not representable accurately as a float, so the generated constraints are wrong. I'll have a look, thanks!

In D89697#2484919, @spatel wrote:

In D89697#2483676, @yubing wrote:

No regression appeared in our internal testcases.
It seems the transform is correct, have you verified it with alive-tv?

I was curious to see if I could model it:
https://alive2.llvm.org/ce/z/RXcYY9

Converting #x4f800000 (4294967296) to uint32_t is poison, not 0 though. Am I reading the Alive output correctly? (cc @lebedev.ri @aqjune @nlopes @nikic )

FYI: a fixed version of alive says your file is correct. will commit a fix soon (optimizing it now)

@TomHender Reverse ping - are you still looking at this please?

@TomHender Would you like to continue with this? If not I'd like to commandeer this and complete it.

lebedev.ri added a reviewer: lebedev.ri.Jun 30 2021, 2:23 AM

Sorry for the delayed response from me. I was not actively checking this anymore since there was no further input for a while.

It think back in October I was still looking for a resolution of the apparent cost model inconsistencies. @RKSimon suggested using the LLVM-MCA numbers but it seemed to me that I was adding to the mess, due to apparent inconsistencies in comparison to the other numbers. Thus I was hoping for either a clarification of why this is indeed correct or alternatively an acknowledgement of the issue and that the other costs are still to be updated to be LLVM-MCA-like in the future so this change 'fits in' in the longterm.

In any case you are welcome to take this over as I currently don't expect to have time for LLVM this month.

RKSimon commandeered this revision.Jul 4 2021, 12:32 PM

RKSimon edited reviewers, added: TomHender; removed: RKSimon.

In D89697#2857112, @TomHender wrote:

Sorry for the delayed response from me. I was not actively checking this anymore since there was no further input for a while.

It think back in October I was still looking for a resolution of the apparent cost model inconsistencies. @RKSimon suggested using the LLVM-MCA numbers but it seemed to me that I was adding to the mess, due to apparent inconsistencies in comparison to the other numbers. Thus I was hoping for either a clarification of why this is indeed correct or alternatively an acknowledgement of the issue and that the other costs are still to be updated to be LLVM-MCA-like in the future so this change 'fits in' in the longterm.

In any case you are welcome to take this over as I currently don't expect to have time for LLVM this month.

OK, I'm going to have a go at getting this finished - I'm slowly cleaning up the costs based on real world throughput numbers from llvm-mca (see D103695) instead of the inconsistent magic numbers we had (which often were based on instruction counts). Thanks for your work on this!

RKSimon mentioned this in rG9dbeac16ba9b: [X86] ReplaceNodeResults - fp_to_sint/uint - manually widen v2i32 results to….Jul 9 2021, 4:08 AM

I'm still overhauling the cast costs

Updated patch to also handle vXf64->vXi32 vector and f32/f64->u64/u32 scalar fptoui cases.

I've also added AVX1 support for v8f32->v8i32 where we still need a VBLENDPS (instead of a VSRAI ymm).

Refreshed costs using the helper script from D103695 for reference.

Harbormaster completed remote builds in B113579: Diff 358048.Jul 12 2021, 2:36 PM

Can we make the cost model changes as a preliminary/independent commit?

In D89697#2873919, @spatel wrote:

Can we make the cost model changes as a preliminary/independent commit?

Are there any specific cost changes that you think can be pulled out? Most of the updates are necessary to match the change in codegen.

In D89697#2874132, @RKSimon wrote:

In D89697#2873919, @spatel wrote:

Can we make the cost model changes as a preliminary/independent commit?

Are there any specific cost changes that you think can be pulled out? Most of the updates are necessary to match the change in codegen.

Ah, right.
LGTM - some of the test diffs with an extra instruction seem like they would not be wins, but I'm guessing those patterns are rare and the perf diff would be in the noise.

This revision is now accepted and ready to land.Jul 13 2021, 9:41 AM

RKSimon retitled this revision from * [x86] Implement smarter instruction lowering for FP_TO_UINT from vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. to [X86] Implement smarter instruction lowering for FP_TO_UINT from vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction..Jul 14 2021, 3:18 AM

RKSimon edited the summary of this revision. (Show Details)

This revision was landed with ongoing or failed builds.Jul 14 2021, 4:04 AM

Closed by commit rGee71c1bbccb1: [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to… (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rGee71c1bbccb1: [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to….

just FYI, this patch is causing problems in https://crbug.com/1230187
still investigating

In D89697#2891293, @aeubanks wrote:

just FYI, this patch is causing problems in https://crbug.com/1230187
still investigating

Any luck?

Not yet, it's hard to figure out exactly which file is causing the problem.
Does this patch exploit UB? Or should it just work for all inputs?

In D89697#2893519, @aeubanks wrote:

Not yet, it's hard to figure out exactly which file is causing the problem.
Does this patch exploit UB? Or should it just work for all inputs?

If the floating point value is negative (apart from -0.0) then there can be diffs - but given its fp_to_uint then yes that ub.

AFAICT, this patch probably just adjusted some UB behavior due to converting negative floats to unsigned ints. Sorry for the noise.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

107 lines

X86TargetTransformInfo.cpp

29 lines

test/

Analysis/

CostModel/

X86/

fptoui.ll

133 lines

CodeGen/

X86/

196 lines

108 lines

119 lines

27 lines

137 lines

66 lines

11 lines

903 lines

Transforms/

SLPVectorizer/

X86/

fptoui.ll

59 lines

Diff 358554

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,007 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasSSE2()) {
// Custom lower v2i64 and v2f64 selects.		// Custom lower v2i64 and v2f64 selects.
setOperationAction(ISD::SELECT, MVT::v2f64, Custom);		setOperationAction(ISD::SELECT, MVT::v2f64, Custom);
setOperationAction(ISD::SELECT, MVT::v2i64, Custom);		setOperationAction(ISD::SELECT, MVT::v2i64, Custom);
setOperationAction(ISD::SELECT, MVT::v4i32, Custom);		setOperationAction(ISD::SELECT, MVT::v4i32, Custom);
setOperationAction(ISD::SELECT, MVT::v8i16, Custom);		setOperationAction(ISD::SELECT, MVT::v8i16, Custom);
setOperationAction(ISD::SELECT, MVT::v16i8, Custom);		setOperationAction(ISD::SELECT, MVT::v16i8, Custom);

setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);		setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);
		setOperationAction(ISD::FP_TO_UINT, MVT::v4i32, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::v2i32, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::v2i32, Custom);
		setOperationAction(ISD::FP_TO_UINT, MVT::v2i32, Custom);
setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v4i32, Legal);		setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v4i32, Legal);
setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v2i32, Custom);		setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v2i32, Custom);

// Custom legalize these to avoid over promotion or custom promotion.		// Custom legalize these to avoid over promotion or custom promotion.
for (auto VT : {MVT::v2i8, MVT::v4i8, MVT::v8i8, MVT::v2i16, MVT::v4i16}) {		for (auto VT : {MVT::v2i8, MVT::v4i8, MVT::v8i8, MVT::v2i16, MVT::v4i16}) {
setOperationAction(ISD::FP_TO_SINT, VT, Custom);		setOperationAction(ISD::FP_TO_SINT, VT, Custom);
setOperationAction(ISD::FP_TO_UINT, VT, Custom);		setOperationAction(ISD::FP_TO_UINT, VT, Custom);
setOperationAction(ISD::STRICT_FP_TO_SINT, VT, Custom);		setOperationAction(ISD::STRICT_FP_TO_SINT, VT, Custom);
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasAVX()) {

// (fp_to_int:v8i16 (v8f32 ..)) requires the result type to be promoted		// (fp_to_int:v8i16 (v8f32 ..)) requires the result type to be promoted
// even though v8i16 is a legal type.		// even though v8i16 is a legal type.
setOperationPromotedToType(ISD::FP_TO_SINT, MVT::v8i16, MVT::v8i32);		setOperationPromotedToType(ISD::FP_TO_SINT, MVT::v8i16, MVT::v8i32);
setOperationPromotedToType(ISD::FP_TO_UINT, MVT::v8i16, MVT::v8i32);		setOperationPromotedToType(ISD::FP_TO_UINT, MVT::v8i16, MVT::v8i32);
setOperationPromotedToType(ISD::STRICT_FP_TO_SINT, MVT::v8i16, MVT::v8i32);		setOperationPromotedToType(ISD::STRICT_FP_TO_SINT, MVT::v8i16, MVT::v8i32);
setOperationPromotedToType(ISD::STRICT_FP_TO_UINT, MVT::v8i16, MVT::v8i32);		setOperationPromotedToType(ISD::STRICT_FP_TO_UINT, MVT::v8i16, MVT::v8i32);
setOperationAction(ISD::FP_TO_SINT, MVT::v8i32, Legal);		setOperationAction(ISD::FP_TO_SINT, MVT::v8i32, Legal);
		setOperationAction(ISD::FP_TO_UINT, MVT::v8i32, Custom);
setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v8i32, Legal);		setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::v8i32, Legal);

setOperationAction(ISD::SINT_TO_FP, MVT::v8i32, Legal);		setOperationAction(ISD::SINT_TO_FP, MVT::v8i32, Legal);
setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::v8i32, Legal);		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::v8i32, Legal);

setOperationAction(ISD::STRICT_FP_ROUND, MVT::v4f32, Legal);		setOperationAction(ISD::STRICT_FP_ROUND, MVT::v4f32, Legal);
setOperationAction(ISD::STRICT_FADD, MVT::v8f32, Legal);		setOperationAction(ISD::STRICT_FADD, MVT::v8f32, Legal);
setOperationAction(ISD::STRICT_FADD, MVT::v4f64, Legal);		setOperationAction(ISD::STRICT_FADD, MVT::v4f64, Legal);
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines
if (!Subtarget.useSoftFloat() && Subtarget.hasAVX512()) {		if (!Subtarget.useSoftFloat() && Subtarget.hasAVX512()) {
// These operations are handled on non-VLX by artificially widening in		// These operations are handled on non-VLX by artificially widening in
// isel patterns.		// isel patterns.

setOperationAction(ISD::FP_TO_UINT, MVT::v8i32,		setOperationAction(ISD::FP_TO_UINT, MVT::v8i32,
Subtarget.hasVLX() ? Legal : Custom);		Subtarget.hasVLX() ? Legal : Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v4i32,		setOperationAction(ISD::FP_TO_UINT, MVT::v4i32,
Subtarget.hasVLX() ? Legal : Custom);		Subtarget.hasVLX() ? Legal : Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v2i32, Custom);
setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v8i32,		setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v8i32,
Subtarget.hasVLX() ? Legal : Custom);		Subtarget.hasVLX() ? Legal : Custom);
setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v4i32,		setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v4i32,
Subtarget.hasVLX() ? Legal : Custom);		Subtarget.hasVLX() ? Legal : Custom);
setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v2i32, Custom);		setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::v2i32, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::v8i32,		setOperationAction(ISD::UINT_TO_FP, MVT::v8i32,
Subtarget.hasVLX() ? Legal : Custom);		Subtarget.hasVLX() ? Legal : Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::v4i32,		setOperationAction(ISD::UINT_TO_FP, MVT::v4i32,
▲ Show 20 Lines • Show All 19,423 Lines • ▼ Show 20 Lines	if (VT == MVT::v16i8 && InVT == MVT::v16i16) {
SDValue InHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v8i16, In,		SDValue InHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v8i16, In,
DAG.getIntPtrConstant(8, DL));		DAG.getIntPtrConstant(8, DL));
return DAG.getNode(X86ISD::PACKUS, DL, VT, InLo, InHi);		return DAG.getNode(X86ISD::PACKUS, DL, VT, InLo, InHi);
}		}

llvm_unreachable("All 256->128 cases should have been handled above!");		llvm_unreachable("All 256->128 cases should have been handled above!");
}		}

		// We can leverage the specific way the "cvttps2dq/cvttpd2dq" instruction
		// behaves on out of range inputs to generate optimized conversions.
		static SDValue expandFP_TO_UINT_SSE(MVT VT, SDValue Src, const SDLoc &dl,
		SelectionDAG &DAG,
		const X86Subtarget &Subtarget) {
		MVT SrcVT = Src.getSimpleValueType();
		unsigned DstBits = VT.getScalarSizeInBits();
		assert(DstBits == 32 && "expandFP_TO_UINT_SSE - only vXi32 supported");

		// Calculate the converted result for values in the range 0 to
		// 2^31-1 ("Small") and from 2^31 to 2^32-1 ("Big").
		SDValue Small = DAG.getNode(X86ISD::CVTTP2SI, dl, VT, Src);
		SDValue Big =
		DAG.getNode(X86ISD::CVTTP2SI, dl, VT,
		DAG.getNode(ISD::FSUB, dl, SrcVT, Src,
		DAG.getConstantFP(2147483648.0f, dl, SrcVT)));

		// The "CVTTP2SI" instruction conveniently sets the sign bit if
		// and only if the value was out of range. So we can use that
		// as our indicator that we rather use "Big" instead of "Small".
		//
		// Use "Small" if "IsOverflown" has all bits cleared
		// and "0x80000000 \| Big" if all bits in "IsOverflown" are set.

		// AVX1 can't use the signsplat masking for 256-bit vectors - we have to
		// use the slightly slower blendv select instead.
		if (VT == MVT::v8i32 && !Subtarget.hasAVX2()) {
		SDValue Overflow = DAG.getNode(ISD::OR, dl, VT, Small, Big);
		return DAG.getNode(X86ISD::BLENDV, dl, VT, Small, Overflow, Small);
		}

		SDValue IsOverflown =
		DAG.getNode(X86ISD::VSRAI, dl, VT, Small,
		DAG.getTargetConstant(DstBits - 1, dl, MVT::i8));
		return DAG.getNode(ISD::OR, dl, VT, Small,
		DAG.getNode(ISD::AND, dl, VT, Big, IsOverflown));
		}

SDValue X86TargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const {		SDValue X86TargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const {
bool IsStrict = Op->isStrictFPOpcode();		bool IsStrict = Op->isStrictFPOpcode();
bool IsSigned = Op.getOpcode() == ISD::FP_TO_SINT \|\|		bool IsSigned = Op.getOpcode() == ISD::FP_TO_SINT \|\|
Op.getOpcode() == ISD::STRICT_FP_TO_SINT;		Op.getOpcode() == ISD::STRICT_FP_TO_SINT;
MVT VT = Op->getSimpleValueType(0);		MVT VT = Op->getSimpleValueType(0);
SDValue Src = Op.getOperand(IsStrict ? 1 : 0);		SDValue Src = Op.getOperand(IsStrict ? 1 : 0);
MVT SrcVT = Src.getSimpleValueType();		MVT SrcVT = Src.getSimpleValueType();
SDLoc dl(Op);		SDLoc dl(Op);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (VT.isVector()) {
if (VT == MVT::v8i32 && SrcVT == MVT::v8f64) {		if (VT == MVT::v8i32 && SrcVT == MVT::v8f64) {
assert(!IsSigned && "Expected unsigned conversion!");		assert(!IsSigned && "Expected unsigned conversion!");
assert(Subtarget.useAVX512Regs() && "Requires avx512f");		assert(Subtarget.useAVX512Regs() && "Requires avx512f");
return Op;		return Op;
}		}

// Widen vXi32 fp_to_uint with avx512f to 512-bit source.		// Widen vXi32 fp_to_uint with avx512f to 512-bit source.
if ((VT == MVT::v4i32 \|\| VT == MVT::v8i32) &&		if ((VT == MVT::v4i32 \|\| VT == MVT::v8i32) &&
(SrcVT == MVT::v4f64 \|\| SrcVT == MVT::v4f32 \|\| SrcVT == MVT::v8f32)) {		(SrcVT == MVT::v4f64 \|\| SrcVT == MVT::v4f32 \|\| SrcVT == MVT::v8f32) &&
		Subtarget.useAVX512Regs()) {
assert(!IsSigned && "Expected unsigned conversion!");		assert(!IsSigned && "Expected unsigned conversion!");
assert(Subtarget.useAVX512Regs() && !Subtarget.hasVLX() &&		assert(!Subtarget.hasVLX() && "Unexpected features!");
"Unexpected features!");
MVT WideVT = SrcVT == MVT::v4f64 ? MVT::v8f64 : MVT::v16f32;		MVT WideVT = SrcVT == MVT::v4f64 ? MVT::v8f64 : MVT::v16f32;
MVT ResVT = SrcVT == MVT::v4f64 ? MVT::v8i32 : MVT::v16i32;		MVT ResVT = SrcVT == MVT::v4f64 ? MVT::v8i32 : MVT::v16i32;
// Need to concat with zero vector for strict fp to avoid spurious		// Need to concat with zero vector for strict fp to avoid spurious
// exceptions.		// exceptions.
// TODO: Should we just do this for non-strict as well?		// TODO: Should we just do this for non-strict as well?
SDValue Tmp =		SDValue Tmp =
IsStrict ? DAG.getConstantFP(0.0, dl, WideVT) : DAG.getUNDEF(WideVT);		IsStrict ? DAG.getConstantFP(0.0, dl, WideVT) : DAG.getUNDEF(WideVT);
Src = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, WideVT, Tmp, Src,		Src = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, WideVT, Tmp, Src,
Show All 13 Lines	if ((VT == MVT::v4i32 \|\| VT == MVT::v8i32) &&

if (IsStrict)		if (IsStrict)
return DAG.getMergeValues({Res, Chain}, dl);		return DAG.getMergeValues({Res, Chain}, dl);
return Res;		return Res;
}		}

// Widen vXi64 fp_to_uint/fp_to_sint with avx512dq to 512-bit source.		// Widen vXi64 fp_to_uint/fp_to_sint with avx512dq to 512-bit source.
if ((VT == MVT::v2i64 \|\| VT == MVT::v4i64) &&		if ((VT == MVT::v2i64 \|\| VT == MVT::v4i64) &&
(SrcVT == MVT::v2f64 \|\| SrcVT == MVT::v4f64 \|\| SrcVT == MVT::v4f32)) {		(SrcVT == MVT::v2f64 \|\| SrcVT == MVT::v4f64 \|\| SrcVT == MVT::v4f32) &&
assert(Subtarget.useAVX512Regs() && Subtarget.hasDQI() &&		Subtarget.useAVX512Regs() && Subtarget.hasDQI()) {
!Subtarget.hasVLX() && "Unexpected features!");		assert(!Subtarget.hasVLX() && "Unexpected features!");
MVT WideVT = SrcVT == MVT::v4f32 ? MVT::v8f32 : MVT::v8f64;		MVT WideVT = SrcVT == MVT::v4f32 ? MVT::v8f32 : MVT::v8f64;
// Need to concat with zero vector for strict fp to avoid spurious		// Need to concat with zero vector for strict fp to avoid spurious
// exceptions.		// exceptions.
// TODO: Should we just do this for non-strict as well?		// TODO: Should we just do this for non-strict as well?
SDValue Tmp =		SDValue Tmp =
IsStrict ? DAG.getConstantFP(0.0, dl, WideVT) : DAG.getUNDEF(WideVT);		IsStrict ? DAG.getConstantFP(0.0, dl, WideVT) : DAG.getUNDEF(WideVT);
Src = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, WideVT, Tmp, Src,		Src = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, WideVT, Tmp, Src,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
Show All 40 Lines	if (VT == MVT::v2i64 && SrcVT == MVT::v2f32) {
unsigned Opc = IsSigned ? X86ISD::STRICT_CVTTP2SI		unsigned Opc = IsSigned ? X86ISD::STRICT_CVTTP2SI
: X86ISD::STRICT_CVTTP2UI;		: X86ISD::STRICT_CVTTP2UI;
return DAG.getNode(Opc, dl, {VT, MVT::Other}, {Op->getOperand(0), Tmp});		return DAG.getNode(Opc, dl, {VT, MVT::Other}, {Op->getOperand(0), Tmp});
}		}
unsigned Opc = IsSigned ? X86ISD::CVTTP2SI : X86ISD::CVTTP2UI;		unsigned Opc = IsSigned ? X86ISD::CVTTP2SI : X86ISD::CVTTP2UI;
return DAG.getNode(Opc, dl, VT, Tmp);		return DAG.getNode(Opc, dl, VT, Tmp);
}		}

		// Generate optimized instructions for pre AVX512 unsigned conversions from
		// vXf32 to vXi32.
		if ((VT == MVT::v4i32 && SrcVT == MVT::v4f32) \|\|
		(VT == MVT::v4i32 && SrcVT == MVT::v4f64) \|\|
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Avoid operator order warnings: if ((VT == MVT::v4i32 && SrcVT == MVT::v4f32) \|\| (VT == MVT::v8i32 && SrcVT == MVT::v8f32)) RKSimon: Avoid operator order warnings: ``` if ((VT == MVT::v4i32 && SrcVT == MVT::v4f32) \|\| (VT…
		(VT == MVT::v8i32 && SrcVT == MVT::v8f32)) {
		assert(!IsSigned && "Expected unsigned conversion!");
		return expandFP_TO_UINT_SSE(VT, Src, dl, DAG, Subtarget);
		}

return SDValue();		return SDValue();
}		}

assert(!VT.isVector());		assert(!VT.isVector());

bool UseSSEReg = isScalarFPTypeInSSEReg(SrcVT);		bool UseSSEReg = isScalarFPTypeInSSEReg(SrcVT);

if (!IsSigned && UseSSEReg) {		if (!IsSigned && UseSSEReg) {
// Conversions from f32/f64 with AVX512 should be legal.		// Conversions from f32/f64 with AVX512 should be legal.
if (Subtarget.hasAVX512())		if (Subtarget.hasAVX512())
return Op;		return Op;

		// We can leverage the specific way the "cvttss2si/cvttsd2si" instruction
		// behaves on out of range inputs to generate optimized conversions.
		if (!IsStrict && ((VT == MVT::i32 && !Subtarget.is64Bit()) \|\|
		(VT == MVT::i64 && Subtarget.is64Bit()))) {
		unsigned DstBits = VT.getScalarSizeInBits();
		APInt UIntLimit = APInt::getSignMask(DstBits);
		SDValue FloatOffset = DAG.getNode(ISD::UINT_TO_FP, dl, SrcVT,
		DAG.getConstant(UIntLimit, dl, VT));
		RKSimonAuthorUnsubmitted Done Reply Inline Actions clang format this block RKSimon: clang format this block
		MVT SrcVecVT = MVT::getVectorVT(SrcVT, 128 / SrcVT.getScalarSizeInBits());

		// Calculate the converted result for values in the range:
		// (i32) 0 to 2^31-1 ("Small") and from 2^31 to 2^32-1 ("Big").
		// (i64) 0 to 2^63-1 ("Small") and from 2^63 to 2^64-1 ("Big").
		SDValue Small =
		DAG.getNode(X86ISD::CVTTS2SI, dl, VT,
		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, SrcVecVT, Src));
		SDValue Big = DAG.getNode(
		X86ISD::CVTTS2SI, dl, VT,
		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, SrcVecVT,
		DAG.getNode(ISD::FSUB, dl, SrcVT, Src, FloatOffset)));

		// The "CVTTS2SI" instruction conveniently sets the sign bit if
		// and only if the value was out of range. So we can use that
		// as our indicator that we rather use "Big" instead of "Small".
		//
		// Use "Small" if "IsOverflown" has all bits cleared
		// and "0x80000000 \| Big" if all bits in "IsOverflown" are set.
		SDValue IsOverflown = DAG.getNode(
		ISD::SRA, dl, VT, Small, DAG.getConstant(DstBits - 1, dl, MVT::i8));
		return DAG.getNode(ISD::OR, dl, VT, Small,
		DAG.getNode(ISD::AND, dl, VT, Big, IsOverflown));
		}

// Use default expansion for i64.		// Use default expansion for i64.
if (VT == MVT::i64)		if (VT == MVT::i64)
return SDValue();		return SDValue();

assert(VT == MVT::i32 && "Unexpected VT!");		assert(VT == MVT::i32 && "Unexpected VT!");

// Promote i32 to i64 and use a signed operation on 64-bit targets.		// Promote i32 to i64 and use a signed operation on 64-bit targets.
// FIXME: This does not generate an invalid exception if the input does not		// FIXME: This does not generate an invalid exception if the input does not
▲ Show 20 Lines • Show All 9,387 Lines • ▼ Show 20 Lines	if (VT.isVector() && VT.getScalarSizeInBits() < 32) {
Results.push_back(Res);		Results.push_back(Res);
if (IsStrict)		if (IsStrict)
Results.push_back(Chain);		Results.push_back(Chain);
return;		return;
}		}


if (VT == MVT::v2i32) {		if (VT == MVT::v2i32) {
assert((IsSigned \|\| Subtarget.hasAVX512()) &&		assert((!IsStrict \|\| IsSigned \|\| Subtarget.hasAVX512()) &&
"Can only handle signed conversion without AVX512");		"Strict unsigned conversion requires AVX512");
assert(Subtarget.hasSSE2() && "Requires at least SSE2!");		assert(Subtarget.hasSSE2() && "Requires at least SSE2!");
assert(getTypeAction(*DAG.getContext(), VT) == TypeWidenVector &&		assert(getTypeAction(*DAG.getContext(), VT) == TypeWidenVector &&
"Unexpected type action!");		"Unexpected type action!");
if (Src.getValueType() == MVT::v2f64) {		if (Src.getValueType() == MVT::v2f64) {
		if (!IsSigned && !Subtarget.hasAVX512()) {
		SDValue Res =
		expandFP_TO_UINT_SSE(MVT::v4i32, Src, dl, DAG, Subtarget);
		Results.push_back(Res);
		return;
		}

unsigned Opc;		unsigned Opc;
if (IsStrict)		if (IsStrict)
Opc = IsSigned ? X86ISD::STRICT_CVTTP2SI : X86ISD::STRICT_CVTTP2UI;		Opc = IsSigned ? X86ISD::STRICT_CVTTP2SI : X86ISD::STRICT_CVTTP2UI;
else		else
Opc = IsSigned ? X86ISD::CVTTP2SI : X86ISD::CVTTP2UI;		Opc = IsSigned ? X86ISD::CVTTP2SI : X86ISD::CVTTP2UI;

// If we have VLX we can emit a target specific FP_TO_UINT node,.		// If we have VLX we can emit a target specific FP_TO_UINT node,.
if (!IsSigned && !Subtarget.hasVLX()) {		if (!IsSigned && !Subtarget.hasVLX()) {
▲ Show 20 Lines • Show All 21,568 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVX2ConversionTbl[] = {
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 3 },		{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 3 },
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 3 },		{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 3 },

{ ISD::FP_TO_SINT, MVT::v16i16, MVT::v8f32, 1 },		{ ISD::FP_TO_SINT, MVT::v16i16, MVT::v8f32, 1 },
{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f64, 1 },		{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f64, 1 },
{ ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f32, 1 },		{ ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f32, 1 },
{ ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f64, 3 },		{ ISD::FP_TO_SINT, MVT::v8i32, MVT::v8f64, 3 },

		{ ISD::FP_TO_UINT, MVT::i64, MVT::f32, 3 },
		{ ISD::FP_TO_UINT, MVT::i64, MVT::f64, 3 },
{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v8f32, 1 },		{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v8f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 4 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 3 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 7 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 4 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 7 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 4 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 3 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v4f64, 7 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v4f64, 4 },

{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 2 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 2 },
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v16i8, 2 },		{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v16i8, 2 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v8i16, 2 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v8i16, 2 },
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },		{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },		{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },		{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 3 },		{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 3 },
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry AVXConversionTbl[] = {
{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v8f32, 2 },		{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v8f32, 2 },
{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v4f64, 2 },		{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v4f64, 2 },
{ ISD::FP_TO_UINT, MVT::v32i8, MVT::v8f32, 2 },		{ ISD::FP_TO_UINT, MVT::v32i8, MVT::v8f32, 2 },
{ ISD::FP_TO_UINT, MVT::v32i8, MVT::v4f64, 2 },		{ ISD::FP_TO_UINT, MVT::v32i8, MVT::v4f64, 2 },
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f32, 2 },		{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f32, 2 },
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v4f64, 2 },		{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v4f64, 2 },
{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v8f32, 2 },		{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v8f32, 2 },
{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v4f64, 2 },		{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v4f64, 2 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 9 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 3 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 9 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 4 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 9 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 6 },
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v4f64, 9 },		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 7 },
		{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v4f64, 7 },

{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },		{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },		{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
};		};

static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::ZERO_EXTEND, MVT::v2i64, MVT::v16i8, 1 },		{ ISD::ZERO_EXTEND, MVT::v2i64, MVT::v16i8, 1 },
{ ISD::SIGN_EXTEND, MVT::v2i64, MVT::v16i8, 1 },		{ ISD::SIGN_EXTEND, MVT::v2i64, MVT::v16i8, 1 },
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
{ ISD::FP_TO_SINT, MVT::v16i8, MVT::v4f32, 2 },		{ ISD::FP_TO_SINT, MVT::v16i8, MVT::v4f32, 2 },
{ ISD::FP_TO_SINT, MVT::v16i8, MVT::v2f64, 2 },		{ ISD::FP_TO_SINT, MVT::v16i8, MVT::v2f64, 2 },
{ ISD::FP_TO_SINT, MVT::v8i16, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v8i16, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v8i16, MVT::v2f64, 1 },		{ ISD::FP_TO_SINT, MVT::v8i16, MVT::v2f64, 1 },
{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v2f64, 1 },		{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v2f64, 1 },

{ ISD::FP_TO_UINT, MVT::i32, MVT::f32, 1 },		{ ISD::FP_TO_UINT, MVT::i32, MVT::f32, 1 },
{ ISD::FP_TO_UINT, MVT::i64, MVT::f32, 5 },		{ ISD::FP_TO_UINT, MVT::i64, MVT::f32, 4 },
{ ISD::FP_TO_UINT, MVT::i32, MVT::f64, 1 },		{ ISD::FP_TO_UINT, MVT::i32, MVT::f64, 1 },
{ ISD::FP_TO_UINT, MVT::i64, MVT::f64, 5 },		{ ISD::FP_TO_UINT, MVT::i64, MVT::f64, 4 },
{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v4f32, 2 },		{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v4f32, 2 },
{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v2f64, 2 },		{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v2f64, 2 },
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v2f64, 1 },		{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v2f64, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 6 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 4 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 3 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v2f64, 4 },
};		};

static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {		static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {
// These are somewhat magic numbers justified by comparing the		// These are somewhat magic numbers justified by comparing the
// output of llvm-mca for our various supported scheduler models		// output of llvm-mca for our various supported scheduler models
// and basing it off the worst case scenario.		// and basing it off the worst case scenario.
{ ISD::SINT_TO_FP, MVT::f32, MVT::i32, 3 },		{ ISD::SINT_TO_FP, MVT::f32, MVT::i32, 3 },
{ ISD::SINT_TO_FP, MVT::f64, MVT::i32, 3 },		{ ISD::SINT_TO_FP, MVT::f64, MVT::i32, 3 },
▲ Show 20 Lines • Show All 3,054 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/fptoui.ll

	Show All 13 Lines
	; SSE2-LABEL: 'fptoui_double_i64'			; SSE2-LABEL: 'fptoui_double_i64'
	; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %I64 = fptoui double undef to i64			; SSE2-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %I64 = fptoui double undef to i64
	; SSE2-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SSE42-LABEL: 'fptoui_double_i64'			; SSE42-LABEL: 'fptoui_double_i64'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui double undef to i64			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
	; SSE42-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX-LABEL: 'fptoui_double_i64'			; AVX1-LABEL: 'fptoui_double_i64'
	; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui double undef to i64			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
	; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
				; AVX2-LABEL: 'fptoui_double_i64'
				; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = fptoui double undef to i64
				; AVX2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512F-LABEL: 'fptoui_double_i64'			; AVX512F-LABEL: 'fptoui_double_i64'
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512DQ-LABEL: 'fptoui_double_i64'			; AVX512DQ-LABEL: 'fptoui_double_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui double undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SLM-LABEL: 'fptoui_double_i64'			; SLM-LABEL: 'fptoui_double_i64'
	; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui double undef to i64			; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui double undef to i64
	; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptoui <2 x double> undef to <2 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V4I64 = fptoui <4 x double> undef to <4 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V8I64 = fptoui <8 x double> undef to <8 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I64 = fptoui double undef to i64			%I64 = fptoui double undef to i64
	%V2I64 = fptoui <2 x double> undef to <2 x i64>			%V2I64 = fptoui <2 x double> undef to <2 x i64>
	%V4I64 = fptoui <4 x double> undef to <4 x i64>			%V4I64 = fptoui <4 x double> undef to <4 x i64>
	%V8I64 = fptoui <8 x double> undef to <8 x i64>			%V8I64 = fptoui <8 x double> undef to <8 x i64>
	ret i32 undef			ret i32 undef
	}			}

	define i32 @fptoui_double_i32(i32 %arg) {			define i32 @fptoui_double_i32(i32 %arg) {
	; SSE2-LABEL: 'fptoui_double_i32'			; SSE2-LABEL: 'fptoui_double_i32'
	; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I32 = fptoui double undef to i32			; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I32 = fptoui double undef to i32
	; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SSE42-LABEL: 'fptoui_double_i32'			; SSE42-LABEL: 'fptoui_double_i32'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; SSE42-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX1-LABEL: 'fptoui_double_i32'			; AVX1-LABEL: 'fptoui_double_i32'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; AVX1-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX2-LABEL: 'fptoui_double_i32'			; AVX2-LABEL: 'fptoui_double_i32'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; AVX2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512-LABEL: 'fptoui_double_i32'			; AVX512-LABEL: 'fptoui_double_i32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SLM-LABEL: 'fptoui_double_i32'			; SLM-LABEL: 'fptoui_double_i32'
	; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32			; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
	; SLM-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptoui double undef to i32			%I32 = fptoui double undef to i32
	%V2I32 = fptoui <2 x double> undef to <2 x i32>			%V2I32 = fptoui <2 x double> undef to <2 x i32>
	%V4I32 = fptoui <4 x double> undef to <4 x i32>			%V4I32 = fptoui <4 x double> undef to <4 x i32>
	%V8I32 = fptoui <8 x double> undef to <8 x i32>			%V8I32 = fptoui <8 x double> undef to <8 x i32>
	ret i32 undef			ret i32 undef
	}			}
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64			; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
	; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 116 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; SSE2-NEXT: Cost Model: Found an estimated cost of 116 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SSE42-LABEL: 'fptoui_float_i64'			; SSE42-LABEL: 'fptoui_float_i64'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui float undef to i64			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
	; SSE42-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 116 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; SSE42-NEXT: Cost Model: Found an estimated cost of 100 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX-LABEL: 'fptoui_float_i64'			; AVX1-LABEL: 'fptoui_float_i64'
	; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui float undef to i64			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
	; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 65 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 57 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 130 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; AVX1-NEXT: Cost Model: Found an estimated cost of 114 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
				;
				; AVX2-LABEL: 'fptoui_float_i64'
				; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = fptoui float undef to i64
				; AVX2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 98 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
				; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512F-LABEL: 'fptoui_float_i64'			; AVX512F-LABEL: 'fptoui_float_i64'
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 69 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 69 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512DQ-LABEL: 'fptoui_float_i64'			; AVX512DQ-LABEL: 'fptoui_float_i64'
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = fptoui float undef to i64
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512DQ-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SLM-LABEL: 'fptoui_float_i64'			; SLM-LABEL: 'fptoui_float_i64'
	; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = fptoui float undef to i64			; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I64 = fptoui float undef to i64
	; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = fptoui <2 x float> undef to <2 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I64 = fptoui <4 x float> undef to <4 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %V8I64 = fptoui <8 x float> undef to <8 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 164 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>			; SLM-NEXT: Cost Model: Found an estimated cost of 148 for instruction: %V16I64 = fptoui <16 x float> undef to <16 x i64>
	; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I64 = fptoui float undef to i64			%I64 = fptoui float undef to i64
	%V2I64 = fptoui <2 x float> undef to <2 x i64>			%V2I64 = fptoui <2 x float> undef to <2 x i64>
	%V4I64 = fptoui <4 x float> undef to <4 x i64>			%V4I64 = fptoui <4 x float> undef to <4 x i64>
	%V8I64 = fptoui <8 x float> undef to <8 x i64>			%V8I64 = fptoui <8 x float> undef to <8 x i64>
	%V16I64 = fptoui <16 x float> undef to <16 x i64>			%V16I64 = fptoui <16 x float> undef to <16 x i64>
	ret i32 undef			ret i32 undef
	}			}

	define i32 @fptoui_float_i32(i32 %arg) {			define i32 @fptoui_float_i32(i32 %arg) {
	; SSE2-LABEL: 'fptoui_float_i32'			; SSE2-LABEL: 'fptoui_float_i32'
	; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I32 = fptoui float undef to i32			; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I32 = fptoui float undef to i32
	; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; SSE2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SSE42-LABEL: 'fptoui_float_i32'			; SSE42-LABEL: 'fptoui_float_i32'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; SSE42-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX1-LABEL: 'fptoui_float_i32'			; AVX1-LABEL: 'fptoui_float_i32'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; AVX1-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX2-LABEL: 'fptoui_float_i32'			; AVX2-LABEL: 'fptoui_float_i32'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512-LABEL: 'fptoui_float_i32'			; AVX512-LABEL: 'fptoui_float_i32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SLM-LABEL: 'fptoui_float_i32'			; SLM-LABEL: 'fptoui_float_i32'
	; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32			; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui float undef to i32
	; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = fptoui <2 x float> undef to <2 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4I32 = fptoui <4 x float> undef to <4 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I32 = fptoui <8 x float> undef to <8 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>			; SLM-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V16I32 = fptoui <16 x float> undef to <16 x i32>
	; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%I32 = fptoui float undef to i32			%I32 = fptoui float undef to i32
	%V2I32 = fptoui <2 x float> undef to <2 x i32>			%V2I32 = fptoui <2 x float> undef to <2 x i32>
	%V4I32 = fptoui <4 x float> undef to <4 x i32>			%V4I32 = fptoui <4 x float> undef to <4 x i32>
	%V8I32 = fptoui <8 x float> undef to <8 x i32>			%V8I32 = fptoui <8 x float> undef to <8 x i32>
	%V16I32 = fptoui <16 x float> undef to <16 x i32>			%V16I32 = fptoui <16 x float> undef to <16 x i32>
	ret i32 undef			ret i32 undef
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/concat-cast.ll

	Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = fptosi <2 x float> %x to <2 x i32>			%s0 = fptosi <2 x float> %x to <2 x i32>
	%s1 = fptosi <2 x float> %y to <2 x i32>			%s1 = fptosi <2 x float> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @fptoui_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {			define <4 x i32> @fptoui_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {
	; SSE2-LABEL: fptoui_v4f32_v4i32:			; SSE-LABEL: fptoui_v4f32_v4i32:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: movaps {{.*#+}} xmm3 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]			; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: movaps %xmm0, %xmm2			; SSE-NEXT: cvttps2dq %xmm0, %xmm1
	; SSE2-NEXT: cmpltps %xmm3, %xmm2			; SSE-NEXT: movdqa %xmm1, %xmm2
	; SSE2-NEXT: cvttps2dq %xmm0, %xmm4			; SSE-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: subps %xmm3, %xmm0			; SSE-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; SSE2-NEXT: cvttps2dq %xmm0, %xmm0			; SSE-NEXT: cvttps2dq %xmm0, %xmm0
	; SSE2-NEXT: movaps {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]			; SSE-NEXT: pand %xmm2, %xmm0
	; SSE2-NEXT: xorps %xmm5, %xmm0			; SSE-NEXT: por %xmm1, %xmm0
	; SSE2-NEXT: andps %xmm2, %xmm4			; SSE-NEXT: retq
	; SSE2-NEXT: andnps %xmm0, %xmm2
	; SSE2-NEXT: orps %xmm4, %xmm2
	; SSE2-NEXT: movaps %xmm1, %xmm0
	; SSE2-NEXT: cmpltps %xmm3, %xmm0
	; SSE2-NEXT: cvttps2dq %xmm1, %xmm4
	; SSE2-NEXT: subps %xmm3, %xmm1
	; SSE2-NEXT: cvttps2dq %xmm1, %xmm1
	; SSE2-NEXT: xorps %xmm5, %xmm1
	; SSE2-NEXT: andps %xmm0, %xmm4
	; SSE2-NEXT: andnps %xmm1, %xmm0
	; SSE2-NEXT: orps %xmm4, %xmm0
	; SSE2-NEXT: movlhps {{.*#+}} xmm2 = xmm2[0],xmm0[0]
	; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: retq
	;
	; SSE4-LABEL: fptoui_v4f32_v4i32:
	; SSE4: # %bb.0:
	; SSE4-NEXT: movaps {{.*#+}} xmm4 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
	; SSE4-NEXT: movaps %xmm0, %xmm2
	; SSE4-NEXT: cmpltps %xmm4, %xmm2
	; SSE4-NEXT: cvttps2dq %xmm0, %xmm5
	; SSE4-NEXT: subps %xmm4, %xmm0
	; SSE4-NEXT: cvttps2dq %xmm0, %xmm3
	; SSE4-NEXT: movaps {{.*#+}} xmm6 = [2147483648,2147483648,2147483648,2147483648]
	; SSE4-NEXT: xorps %xmm6, %xmm3
	; SSE4-NEXT: movaps %xmm2, %xmm0
	; SSE4-NEXT: blendvps %xmm0, %xmm5, %xmm3
	; SSE4-NEXT: movaps %xmm1, %xmm0
	; SSE4-NEXT: cmpltps %xmm4, %xmm0
	; SSE4-NEXT: cvttps2dq %xmm1, %xmm2
	; SSE4-NEXT: subps %xmm4, %xmm1
	; SSE4-NEXT: cvttps2dq %xmm1, %xmm1
	; SSE4-NEXT: xorps %xmm6, %xmm1
	; SSE4-NEXT: blendvps %xmm0, %xmm2, %xmm1
	; SSE4-NEXT: movlhps {{.*#+}} xmm3 = xmm3[0],xmm1[0]
	; SSE4-NEXT: movaps %xmm3, %xmm0
	; SSE4-NEXT: retq
	;			;
	; AVX1-LABEL: fptoui_v4f32_v4i32:			; AVX1-LABEL: fptoui_v4f32_v4i32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovaps {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
	; AVX1-NEXT: vcmpltps %xmm2, %xmm0, %xmm3
	; AVX1-NEXT: vsubps %xmm2, %xmm0, %xmm4
	; AVX1-NEXT: vcvttps2dq %xmm4, %xmm4
	; AVX1-NEXT: vmovaps {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
	; AVX1-NEXT: vxorps %xmm5, %xmm4, %xmm4
	; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0
	; AVX1-NEXT: vblendvps %xmm3, %xmm0, %xmm4, %xmm0
	; AVX1-NEXT: vcmpltps %xmm2, %xmm1, %xmm3
	; AVX1-NEXT: vsubps %xmm2, %xmm1, %xmm2
	; AVX1-NEXT: vcvttps2dq %xmm2, %xmm2
	; AVX1-NEXT: vxorps %xmm5, %xmm2, %xmm2
	; AVX1-NEXT: vcvttps2dq %xmm1, %xmm1
	; AVX1-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX1-NEXT: vcvttps2dq %xmm0, %xmm1
				; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
				; AVX1-NEXT: vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0
				; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
				; AVX1-NEXT: vpor %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: fptoui_v4f32_v4i32:			; AVX2-LABEL: fptoui_v4f32_v4i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vbroadcastss {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
	; AVX2-NEXT: vcmpltps %xmm2, %xmm0, %xmm3
	; AVX2-NEXT: vsubps %xmm2, %xmm0, %xmm4
	; AVX2-NEXT: vcvttps2dq %xmm4, %xmm4
	; AVX2-NEXT: vbroadcastss {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
	; AVX2-NEXT: vxorps %xmm5, %xmm4, %xmm4
	; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0
	; AVX2-NEXT: vblendvps %xmm3, %xmm0, %xmm4, %xmm0
	; AVX2-NEXT: vcmpltps %xmm2, %xmm1, %xmm3
	; AVX2-NEXT: vsubps %xmm2, %xmm1, %xmm2
	; AVX2-NEXT: vcvttps2dq %xmm2, %xmm2
	; AVX2-NEXT: vxorps %xmm5, %xmm2, %xmm2
	; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1
	; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX2-NEXT: vcvttps2dq %xmm0, %xmm1
				; AVX2-NEXT: vpsrad $31, %xmm1, %xmm2
				; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
				; AVX2-NEXT: vsubps %xmm3, %xmm0, %xmm0
				; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0
				; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0
				; AVX2-NEXT: vpor %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: fptoui_v4f32_v4i32:			; AVX512F-LABEL: fptoui_v4f32_v4i32:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512F-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0			; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0			; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = fptosi <2 x double> %x to <2 x i32>			%s0 = fptosi <2 x double> %x to <2 x i32>
	%s1 = fptosi <2 x double> %y to <2 x i32>			%s1 = fptosi <2 x double> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @fptoui_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {			define <4 x i32> @fptoui_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {
	; SSE2-LABEL: fptoui_v4f64_v4i32:			; SSE-LABEL: fptoui_v4f64_v4i32:
	; SSE2: # %bb.0:			; SSE: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm0, %rax			; SSE-NEXT: movapd {{.*#+}} xmm2 = [2.147483648E+9,2.147483648E+9]
	; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]			; SSE-NEXT: cvttpd2dq %xmm0, %xmm3
	; SSE2-NEXT: cvttsd2si %xmm0, %rcx			; SSE-NEXT: subpd %xmm2, %xmm0
	; SSE2-NEXT: cvttsd2si %xmm1, %rdx			; SSE-NEXT: cvttpd2dq %xmm0, %xmm4
	; SSE2-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]			; SSE-NEXT: movapd %xmm3, %xmm0
	; SSE2-NEXT: cvttsd2si %xmm1, %rsi			; SSE-NEXT: psrad $31, %xmm0
	; SSE2-NEXT: movd %edx, %xmm1			; SSE-NEXT: pand %xmm4, %xmm0
	; SSE2-NEXT: movd %esi, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]			; SSE-NEXT: cvttpd2dq %xmm1, %xmm3
	; SSE2-NEXT: movd %eax, %xmm0			; SSE-NEXT: subpd %xmm2, %xmm1
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE-NEXT: cvttpd2dq %xmm1, %xmm1
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; SSE-NEXT: movapd %xmm3, %xmm2
	; SSE2-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE-NEXT: psrad $31, %xmm2
	; SSE2-NEXT: retq			; SSE-NEXT: pand %xmm1, %xmm2
	;			; SSE-NEXT: por %xmm3, %xmm2
	; SSE4-LABEL: fptoui_v4f64_v4i32:			; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; SSE4: # %bb.0:			; SSE-NEXT: retq
	; SSE4-NEXT: cvttsd2si %xmm0, %rax
	; SSE4-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
	; SSE4-NEXT: cvttsd2si %xmm0, %rcx
	; SSE4-NEXT: cvttsd2si %xmm1, %rdx
	; SSE4-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]
	; SSE4-NEXT: cvttsd2si %xmm1, %rsi
	; SSE4-NEXT: movd %eax, %xmm0
	; SSE4-NEXT: pinsrd $1, %ecx, %xmm0
	; SSE4-NEXT: pinsrd $2, %edx, %xmm0
	; SSE4-NEXT: pinsrd $3, %esi, %xmm0
	; SSE4-NEXT: retq
	;			;
	; AVX1-LABEL: fptoui_v4f64_v4i32:			; AVX1-LABEL: fptoui_v4f64_v4i32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
	; AVX1-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; AVX1-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX1-NEXT: vmovapd {{.*#+}} ymm2 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vcmpltpd %ymm2, %ymm0, %ymm3			; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm1
	; AVX1-NEXT: vpackssdw %xmm3, %xmm3, %xmm3			; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
	; AVX1-NEXT: vsubpd %ymm2, %ymm0, %ymm4			; AVX1-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vcvttpd2dq %ymm4, %xmm4
	; AVX1-NEXT: vmovapd {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
	; AVX1-NEXT: vxorpd %xmm5, %xmm4, %xmm4
	; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0			; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0
	; AVX1-NEXT: vblendvps %xmm3, %xmm0, %xmm4, %xmm0			; AVX1-NEXT: vandpd %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vcmpltpd %ymm2, %ymm1, %ymm3			; AVX1-NEXT: vorpd %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vpackssdw %xmm3, %xmm3, %xmm3
	; AVX1-NEXT: vsubpd %ymm2, %ymm1, %ymm2
	; AVX1-NEXT: vcvttpd2dq %ymm2, %xmm2
	; AVX1-NEXT: vxorpd %xmm5, %xmm2, %xmm2
	; AVX1-NEXT: vcvttpd2dq %ymm1, %xmm1
	; AVX1-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: fptoui_v4f64_v4i32:			; AVX2-LABEL: fptoui_v4f64_v4i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1
	; AVX2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; AVX2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm2 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]			; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX2-NEXT: vcmpltpd %ymm2, %ymm0, %ymm3			; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm1
	; AVX2-NEXT: vpackssdw %xmm3, %xmm3, %xmm3			; AVX2-NEXT: vpsrad $31, %xmm1, %xmm2
	; AVX2-NEXT: vsubpd %ymm2, %ymm0, %ymm4			; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm3 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
	; AVX2-NEXT: vcvttpd2dq %ymm4, %xmm4			; AVX2-NEXT: vsubpd %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: vbroadcastss {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
	; AVX2-NEXT: vxorpd %xmm5, %xmm4, %xmm4
	; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0			; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0
	; AVX2-NEXT: vblendvps %xmm3, %xmm0, %xmm4, %xmm0			; AVX2-NEXT: vandpd %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vcmpltpd %ymm2, %ymm1, %ymm3			; AVX2-NEXT: vorpd %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: vpackssdw %xmm3, %xmm3, %xmm3
	; AVX2-NEXT: vsubpd %ymm2, %ymm1, %ymm2
	; AVX2-NEXT: vcvttpd2dq %ymm2, %xmm2
	; AVX2-NEXT: vxorpd %xmm5, %xmm2, %xmm2
	; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
	; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: fptoui_v4f64_v4i32:			; AVX512F-LABEL: fptoui_v4f64_v4i32:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512F-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX512F-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0			; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fptoui-sat-scalar.ll

	Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines
	; X86-X87-NEXT: movl %ecx, %eax			; X86-X87-NEXT: movl %ecx, %eax
	; X86-X87-NEXT: .LBB5_4:			; X86-X87-NEXT: .LBB5_4:
	; X86-X87-NEXT: addl $20, %esp			; X86-X87-NEXT: addl $20, %esp
	; X86-X87-NEXT: retl			; X86-X87-NEXT: retl
	;			;
	; X86-SSE-LABEL: test_unsigned_i32_f32:			; X86-SSE-LABEL: test_unsigned_i32_f32:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X86-SSE-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE-NEXT: movaps %xmm0, %xmm2			; X86-SSE-NEXT: movl %eax, %ecx
	; X86-SSE-NEXT: subss %xmm1, %xmm2			; X86-SSE-NEXT: sarl $31, %ecx
	; X86-SSE-NEXT: cvttss2si %xmm2, %eax			; X86-SSE-NEXT: movaps %xmm0, %xmm1
	; X86-SSE-NEXT: xorl $-2147483648, %eax # imm = 0x80000000			; X86-SSE-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
	; X86-SSE-NEXT: cvttss2si %xmm0, %ecx			; X86-SSE-NEXT: cvttss2si %xmm1, %edx
	; X86-SSE-NEXT: ucomiss %xmm0, %xmm1			; X86-SSE-NEXT: andl %ecx, %edx
	; X86-SSE-NEXT: cmovbel %eax, %ecx			; X86-SSE-NEXT: orl %eax, %edx
	; X86-SSE-NEXT: xorl %edx, %edx			; X86-SSE-NEXT: xorl %ecx, %ecx
	; X86-SSE-NEXT: xorps %xmm1, %xmm1			; X86-SSE-NEXT: xorps %xmm1, %xmm1
	; X86-SSE-NEXT: ucomiss %xmm1, %xmm0			; X86-SSE-NEXT: ucomiss %xmm1, %xmm0
	; X86-SSE-NEXT: cmovael %ecx, %edx			; X86-SSE-NEXT: cmovael %edx, %ecx
	; X86-SSE-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movl $-1, %eax			; X86-SSE-NEXT: movl $-1, %eax
	; X86-SSE-NEXT: cmovbel %edx, %eax			; X86-SSE-NEXT: cmovbel %ecx, %eax
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i32_f32:			; X64-LABEL: test_unsigned_i32_f32:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: cvttss2si %xmm0, %rax			; X64-NEXT: cvttss2si %xmm0, %rax
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: xorps %xmm1, %xmm1			; X64-NEXT: xorps %xmm1, %xmm1
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: ucomiss %xmm1, %xmm0
	▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; X86-SSE-NEXT: movl $-1, %ecx			; X86-SSE-NEXT: movl $-1, %ecx
	; X86-SSE-NEXT: cmoval %ecx, %edx			; X86-SSE-NEXT: cmoval %ecx, %edx
	; X86-SSE-NEXT: cmoval %ecx, %eax			; X86-SSE-NEXT: cmoval %ecx, %eax
	; X86-SSE-NEXT: addl $20, %esp			; X86-SSE-NEXT: addl $20, %esp
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i64_f32:			; X64-LABEL: test_unsigned_i64_f32:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X64-NEXT: movaps %xmm0, %xmm2
	; X64-NEXT: subss %xmm1, %xmm2
	; X64-NEXT: cvttss2si %xmm2, %rax
	; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; X64-NEXT: xorq %rax, %rcx
	; X64-NEXT: cvttss2si %xmm0, %rax			; X64-NEXT: cvttss2si %xmm0, %rax
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: cmovaeq %rcx, %rax			; X64-NEXT: sarq $63, %rcx
				; X64-NEXT: movaps %xmm0, %xmm1
				; X64-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; X64-NEXT: cvttss2si %xmm1, %rdx
				; X64-NEXT: andq %rcx, %rdx
				; X64-NEXT: orq %rax, %rdx
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: xorps %xmm1, %xmm1			; X64-NEXT: xorps %xmm1, %xmm1
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: ucomiss %xmm1, %xmm0
	; X64-NEXT: cmovaeq %rax, %rcx			; X64-NEXT: cmovaeq %rdx, %rcx
	; X64-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-NEXT: movq $-1, %rax			; X64-NEXT: movq $-1, %rax
	; X64-NEXT: cmovbeq %rcx, %rax			; X64-NEXT: cmovbeq %rcx, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%x = call i64 @llvm.fptoui.sat.i64.f32(float %f)			%x = call i64 @llvm.fptoui.sat.i64.f32(float %f)
	ret i64 %x			ret i64 %x
	}			}

	▲ Show 20 Lines • Show All 644 Lines • ▼ Show 20 Lines
	;			;
	; X86-SSE-LABEL: test_unsigned_i32_f64:			; X86-SSE-LABEL: test_unsigned_i32_f64:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-SSE-NEXT: xorpd %xmm1, %xmm1			; X86-SSE-NEXT: xorpd %xmm1, %xmm1
	; X86-SSE-NEXT: maxsd %xmm1, %xmm0			; X86-SSE-NEXT: maxsd %xmm1, %xmm0
	; X86-SSE-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: minsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: cvttsd2si %xmm0, %ecx			; X86-SSE-NEXT: cvttsd2si %xmm0, %ecx
	; X86-SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X86-SSE-NEXT: movl %ecx, %edx
	; X86-SSE-NEXT: movapd %xmm0, %xmm2			; X86-SSE-NEXT: sarl $31, %edx
	; X86-SSE-NEXT: subsd %xmm1, %xmm2			; X86-SSE-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: cvttsd2si %xmm2, %eax			; X86-SSE-NEXT: cvttsd2si %xmm0, %eax
	; X86-SSE-NEXT: xorl $-2147483648, %eax # imm = 0x80000000			; X86-SSE-NEXT: andl %edx, %eax
	; X86-SSE-NEXT: ucomisd %xmm1, %xmm0			; X86-SSE-NEXT: orl %ecx, %eax
	; X86-SSE-NEXT: cmovbl %ecx, %eax
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i32_f64:			; X64-LABEL: test_unsigned_i32_f64:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: xorpd %xmm1, %xmm1			; X64-NEXT: xorpd %xmm1, %xmm1
	; X64-NEXT: maxsd %xmm0, %xmm1			; X64-NEXT: maxsd %xmm0, %xmm1
	; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X64-NEXT: minsd %xmm1, %xmm0			; X64-NEXT: minsd %xmm1, %xmm0
	▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	; X86-SSE-NEXT: movl $-1, %ecx			; X86-SSE-NEXT: movl $-1, %ecx
	; X86-SSE-NEXT: cmoval %ecx, %edx			; X86-SSE-NEXT: cmoval %ecx, %edx
	; X86-SSE-NEXT: cmoval %ecx, %eax			; X86-SSE-NEXT: cmoval %ecx, %eax
	; X86-SSE-NEXT: addl $20, %esp			; X86-SSE-NEXT: addl $20, %esp
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i64_f64:			; X64-LABEL: test_unsigned_i64_f64:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; X64-NEXT: movapd %xmm0, %xmm2
	; X64-NEXT: subsd %xmm1, %xmm2
	; X64-NEXT: cvttsd2si %xmm2, %rax
	; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; X64-NEXT: xorq %rax, %rcx
	; X64-NEXT: cvttsd2si %xmm0, %rax			; X64-NEXT: cvttsd2si %xmm0, %rax
	; X64-NEXT: ucomisd %xmm1, %xmm0			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: cmovaeq %rcx, %rax			; X64-NEXT: sarq $63, %rcx
				; X64-NEXT: movapd %xmm0, %xmm1
				; X64-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; X64-NEXT: cvttsd2si %xmm1, %rdx
				; X64-NEXT: andq %rcx, %rdx
				; X64-NEXT: orq %rax, %rdx
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: xorpd %xmm1, %xmm1			; X64-NEXT: xorpd %xmm1, %xmm1
	; X64-NEXT: ucomisd %xmm1, %xmm0			; X64-NEXT: ucomisd %xmm1, %xmm0
	; X64-NEXT: cmovaeq %rax, %rcx			; X64-NEXT: cmovaeq %rdx, %rcx
	; X64-NEXT: ucomisd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-NEXT: ucomisd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-NEXT: movq $-1, %rax			; X64-NEXT: movq $-1, %rax
	; X64-NEXT: cmovbeq %rcx, %rax			; X64-NEXT: cmovbeq %rcx, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%x = call i64 @llvm.fptoui.sat.i64.f64(double %f)			%x = call i64 @llvm.fptoui.sat.i64.f64(double %f)
	ret i64 %x			ret i64 %x
	}			}

	▲ Show 20 Lines • Show All 707 Lines • ▼ Show 20 Lines
	; X86-SSE-LABEL: test_unsigned_i32_f16:			; X86-SSE-LABEL: test_unsigned_i32_f16:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: subl $12, %esp			; X86-SSE-NEXT: subl $12, %esp
	; X86-SSE-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-SSE-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-SSE-NEXT: movl %eax, (%esp)			; X86-SSE-NEXT: movl %eax, (%esp)
	; X86-SSE-NEXT: calll __gnu_h2f_ieee			; X86-SSE-NEXT: calll __gnu_h2f_ieee
	; X86-SSE-NEXT: fstps {{[0-9]+}}(%esp)			; X86-SSE-NEXT: fstps {{[0-9]+}}(%esp)
	; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X86-SSE-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE-NEXT: movaps %xmm0, %xmm2			; X86-SSE-NEXT: movl %eax, %ecx
	; X86-SSE-NEXT: subss %xmm1, %xmm2			; X86-SSE-NEXT: sarl $31, %ecx
	; X86-SSE-NEXT: cvttss2si %xmm2, %eax			; X86-SSE-NEXT: movaps %xmm0, %xmm1
	; X86-SSE-NEXT: xorl $-2147483648, %eax # imm = 0x80000000			; X86-SSE-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
	; X86-SSE-NEXT: cvttss2si %xmm0, %ecx			; X86-SSE-NEXT: cvttss2si %xmm1, %edx
	; X86-SSE-NEXT: ucomiss %xmm1, %xmm0			; X86-SSE-NEXT: andl %ecx, %edx
	; X86-SSE-NEXT: cmovael %eax, %ecx			; X86-SSE-NEXT: orl %eax, %edx
	; X86-SSE-NEXT: xorl %edx, %edx			; X86-SSE-NEXT: xorl %ecx, %ecx
	; X86-SSE-NEXT: xorps %xmm1, %xmm1			; X86-SSE-NEXT: xorps %xmm1, %xmm1
	; X86-SSE-NEXT: ucomiss %xmm1, %xmm0			; X86-SSE-NEXT: ucomiss %xmm1, %xmm0
	; X86-SSE-NEXT: cmovael %ecx, %edx			; X86-SSE-NEXT: cmovael %edx, %ecx
	; X86-SSE-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE-NEXT: movl $-1, %eax			; X86-SSE-NEXT: movl $-1, %eax
	; X86-SSE-NEXT: cmovbel %edx, %eax			; X86-SSE-NEXT: cmovbel %ecx, %eax
	; X86-SSE-NEXT: addl $12, %esp			; X86-SSE-NEXT: addl $12, %esp
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i32_f16:			; X64-LABEL: test_unsigned_i32_f16:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: movzwl %di, %edi			; X64-NEXT: movzwl %di, %edi
	; X64-NEXT: callq __gnu_h2f_ieee@PLT			; X64-NEXT: callq __gnu_h2f_ieee@PLT
	▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	; X86-SSE-NEXT: addl $28, %esp			; X86-SSE-NEXT: addl $28, %esp
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X64-LABEL: test_unsigned_i64_f16:			; X64-LABEL: test_unsigned_i64_f16:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: movzwl %di, %edi			; X64-NEXT: movzwl %di, %edi
	; X64-NEXT: callq __gnu_h2f_ieee@PLT			; X64-NEXT: callq __gnu_h2f_ieee@PLT
	; X64-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X64-NEXT: movaps %xmm0, %xmm2
	; X64-NEXT: subss %xmm1, %xmm2
	; X64-NEXT: cvttss2si %xmm2, %rax
	; X64-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; X64-NEXT: xorq %rax, %rcx
	; X64-NEXT: cvttss2si %xmm0, %rax			; X64-NEXT: cvttss2si %xmm0, %rax
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: cmovaeq %rcx, %rax			; X64-NEXT: sarq $63, %rcx
				; X64-NEXT: movaps %xmm0, %xmm1
				; X64-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; X64-NEXT: cvttss2si %xmm1, %rdx
				; X64-NEXT: andq %rcx, %rdx
				; X64-NEXT: orq %rax, %rdx
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: xorps %xmm1, %xmm1			; X64-NEXT: xorps %xmm1, %xmm1
	; X64-NEXT: ucomiss %xmm1, %xmm0			; X64-NEXT: ucomiss %xmm1, %xmm0
	; X64-NEXT: cmovaeq %rax, %rcx			; X64-NEXT: cmovaeq %rdx, %rcx
	; X64-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-NEXT: ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-NEXT: movq $-1, %rax			; X64-NEXT: movq $-1, %rax
	; X64-NEXT: cmovbeq %rcx, %rax			; X64-NEXT: cmovbeq %rcx, %rax
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%x = call i64 @llvm.fptoui.sat.i64.f16(half %f)			%x = call i64 @llvm.fptoui.sat.i64.f16(half %f)
	ret i64 %x			ret i64 %x
	}			}
	▲ Show 20 Lines • Show All 1,525 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/ftrunc.ll

Show All 23 Lines	; AVX1-NEXT: retq
%i = fptoui float %x to i32		%i = fptoui float %x to i32
%r = uitofp i32 %i to float		%r = uitofp i32 %i to float
ret float %r		ret float %r
}		}

define double @trunc_unsigned_f64(double %x) #0 {		define double @trunc_unsigned_f64(double %x) #0 {
; SSE2-LABEL: trunc_unsigned_f64:		; SSE2-LABEL: trunc_unsigned_f64:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; SSE2-NEXT: movapd %xmm0, %xmm2
; SSE2-NEXT: subsd %xmm1, %xmm2
; SSE2-NEXT: cvttsd2si %xmm2, %rax
; SSE2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
; SSE2-NEXT: xorq %rax, %rcx
; SSE2-NEXT: cvttsd2si %xmm0, %rax		; SSE2-NEXT: cvttsd2si %xmm0, %rax
; SSE2-NEXT: ucomisd %xmm1, %xmm0		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: cmovaeq %rcx, %rax		; SSE2-NEXT: sarq $63, %rcx
; SSE2-NEXT: movq %rax, %xmm1		; SSE2-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
		; SSE2-NEXT: cvttsd2si %xmm0, %rdx
		; SSE2-NEXT: andq %rcx, %rdx
		; SSE2-NEXT: orq %rax, %rdx
		; SSE2-NEXT: movq %rdx, %xmm1
; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
; SSE2-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: movapd %xmm1, %xmm0		; SSE2-NEXT: movapd %xmm1, %xmm0
; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]		; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
; SSE2-NEXT: addsd %xmm1, %xmm0		; SSE2-NEXT: addsd %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: trunc_unsigned_f64:		; SSE41-LABEL: trunc_unsigned_f64:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: roundsd $11, %xmm0, %xmm0		; SSE41-NEXT: roundsd $11, %xmm0, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: trunc_unsigned_f64:		; AVX1-LABEL: trunc_unsigned_f64:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0		; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
%i = fptoui double %x to i64		%i = fptoui double %x to i64
%r = uitofp i64 %i to double		%r = uitofp i64 %i to double
ret double %r		ret double %r
}		}

define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) #0 {		define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) #0 {
; SSE2-LABEL: trunc_unsigned_v4f32:		; SSE2-LABEL: trunc_unsigned_v4f32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movaps {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; SSE2-NEXT: cvttps2dq %xmm0, %xmm1
; SSE2-NEXT: movaps %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: cmpltps %xmm2, %xmm1		; SSE2-NEXT: psrad $31, %xmm2
; SSE2-NEXT: cvttps2dq %xmm0, %xmm3		; SSE2-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE2-NEXT: subps %xmm2, %xmm0
; SSE2-NEXT: cvttps2dq %xmm0, %xmm0		; SSE2-NEXT: cvttps2dq %xmm0, %xmm0
; SSE2-NEXT: xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0		; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: andps %xmm1, %xmm3		; SSE2-NEXT: por %xmm1, %xmm0
; SSE2-NEXT: andnps %xmm0, %xmm1		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [65535,65535,65535,65535]
; SSE2-NEXT: orps %xmm3, %xmm1		; SSE2-NEXT: pand %xmm0, %xmm1
; SSE2-NEXT: movaps {{.*#+}} xmm0 = [65535,65535,65535,65535]
; SSE2-NEXT: andps %xmm1, %xmm0
; SSE2-NEXT: orps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE2-NEXT: psrld $16, %xmm1
; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: psrld $16, %xmm0
; SSE2-NEXT: addps %xmm0, %xmm1		; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE2-NEXT: movaps %xmm1, %xmm0		; SSE2-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
		; SSE2-NEXT: addps %xmm1, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: trunc_unsigned_v4f32:		; SSE41-LABEL: trunc_unsigned_v4f32:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: roundps $11, %xmm0, %xmm0		; SSE41-NEXT: roundps $11, %xmm0, %xmm0
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: trunc_unsigned_v4f32:		; AVX1-LABEL: trunc_unsigned_v4f32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vroundps $11, %xmm0, %xmm0		; AVX1-NEXT: vroundps $11, %xmm0, %xmm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
%i = fptoui <4 x float> %x to <4 x i32>		%i = fptoui <4 x float> %x to <4 x i32>
%r = uitofp <4 x i32> %i to <4 x float>		%r = uitofp <4 x i32> %i to <4 x float>
ret <4 x float> %r		ret <4 x float> %r
}		}

define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) #0 {		define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) #0 {
; SSE2-LABEL: trunc_unsigned_v2f64:		; SSE2-LABEL: trunc_unsigned_v2f64:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; SSE2-NEXT: movapd %xmm0, %xmm1		; SSE2-NEXT: movapd %xmm0, %xmm1
; SSE2-NEXT: subsd %xmm2, %xmm1		; SSE2-NEXT: subsd %xmm2, %xmm1
; SSE2-NEXT: cvttsd2si %xmm1, %rax		; SSE2-NEXT: cvttsd2si %xmm1, %rax
; SSE2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; SSE2-NEXT: cvttsd2si %xmm0, %rcx
; SSE2-NEXT: xorq %rcx, %rax		; SSE2-NEXT: movq %rcx, %rdx
; SSE2-NEXT: cvttsd2si %xmm0, %rdx		; SSE2-NEXT: sarq $63, %rdx
; SSE2-NEXT: ucomisd %xmm2, %xmm0		; SSE2-NEXT: andq %rax, %rdx
; SSE2-NEXT: cmovaeq %rax, %rdx		; SSE2-NEXT: orq %rcx, %rdx
; SSE2-NEXT: movq %rdx, %xmm1		; SSE2-NEXT: movq %rdx, %xmm1
; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
; SSE2-NEXT: movapd %xmm0, %xmm3		; SSE2-NEXT: cvttsd2si %xmm0, %rax
; SSE2-NEXT: subsd %xmm2, %xmm3		; SSE2-NEXT: subsd %xmm2, %xmm0
; SSE2-NEXT: cvttsd2si %xmm3, %rax
; SSE2-NEXT: xorq %rcx, %rax
; SSE2-NEXT: cvttsd2si %xmm0, %rcx		; SSE2-NEXT: cvttsd2si %xmm0, %rcx
; SSE2-NEXT: ucomisd %xmm2, %xmm0		; SSE2-NEXT: movq %rax, %rdx
; SSE2-NEXT: cmovaeq %rax, %rcx		; SSE2-NEXT: sarq $63, %rdx
; SSE2-NEXT: movq %rcx, %xmm0		; SSE2-NEXT: andq %rcx, %rdx
		; SSE2-NEXT: orq %rax, %rdx
		; SSE2-NEXT: movq %rdx, %xmm0
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [4294967295,4294967295]		; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [4294967295,4294967295]
; SSE2-NEXT: pand %xmm1, %xmm0		; SSE2-NEXT: pand %xmm1, %xmm0
; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0		; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE2-NEXT: psrlq $32, %xmm1		; SSE2-NEXT: psrlq $32, %xmm1
; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1		; SSE2-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; SSE2-NEXT: addpd %xmm0, %xmm1		; SSE2-NEXT: addpd %xmm0, %xmm1
Show All 15 Lines
}		}

define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) #0 {		define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) #0 {
; SSE2-LABEL: trunc_unsigned_v4f64:		; SSE2-LABEL: trunc_unsigned_v4f64:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movapd %xmm1, %xmm2		; SSE2-NEXT: movapd %xmm1, %xmm2
; SSE2-NEXT: movsd {{.*#+}} xmm3 = mem[0],zero		; SSE2-NEXT: movsd {{.*#+}} xmm3 = mem[0],zero
; SSE2-NEXT: subsd %xmm3, %xmm1		; SSE2-NEXT: subsd %xmm3, %xmm1
; SSE2-NEXT: cvttsd2si %xmm1, %rcx		; SSE2-NEXT: cvttsd2si %xmm1, %rax
; SSE2-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE2-NEXT: cvttsd2si %xmm2, %rcx
; SSE2-NEXT: xorq %rax, %rcx		; SSE2-NEXT: movq %rcx, %rdx
; SSE2-NEXT: cvttsd2si %xmm2, %rdx		; SSE2-NEXT: sarq $63, %rdx
; SSE2-NEXT: ucomisd %xmm3, %xmm2		; SSE2-NEXT: andq %rax, %rdx
; SSE2-NEXT: cmovaeq %rcx, %rdx		; SSE2-NEXT: orq %rcx, %rdx
; SSE2-NEXT: movq %rdx, %xmm1		; SSE2-NEXT: movq %rdx, %xmm1
; SSE2-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1,1]		; SSE2-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1,1]
; SSE2-NEXT: movapd %xmm2, %xmm4		; SSE2-NEXT: cvttsd2si %xmm2, %rax
; SSE2-NEXT: subsd %xmm3, %xmm4		; SSE2-NEXT: subsd %xmm3, %xmm2
; SSE2-NEXT: cvttsd2si %xmm4, %rcx		; SSE2-NEXT: cvttsd2si %xmm2, %rcx
; SSE2-NEXT: xorq %rax, %rcx		; SSE2-NEXT: movq %rax, %rdx
; SSE2-NEXT: cvttsd2si %xmm2, %rdx		; SSE2-NEXT: sarq $63, %rdx
; SSE2-NEXT: ucomisd %xmm3, %xmm2		; SSE2-NEXT: andq %rcx, %rdx
; SSE2-NEXT: cmovaeq %rcx, %rdx		; SSE2-NEXT: orq %rax, %rdx
; SSE2-NEXT: movq %rdx, %xmm2		; SSE2-NEXT: movq %rdx, %xmm2
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]		; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; SSE2-NEXT: movapd %xmm0, %xmm2		; SSE2-NEXT: movapd %xmm0, %xmm2
; SSE2-NEXT: subsd %xmm3, %xmm2		; SSE2-NEXT: subsd %xmm3, %xmm2
; SSE2-NEXT: cvttsd2si %xmm2, %rcx		; SSE2-NEXT: cvttsd2si %xmm2, %rax
; SSE2-NEXT: xorq %rax, %rcx		; SSE2-NEXT: cvttsd2si %xmm0, %rcx
; SSE2-NEXT: cvttsd2si %xmm0, %rdx		; SSE2-NEXT: movq %rcx, %rdx
; SSE2-NEXT: ucomisd %xmm3, %xmm0		; SSE2-NEXT: sarq $63, %rdx
; SSE2-NEXT: cmovaeq %rcx, %rdx		; SSE2-NEXT: andq %rax, %rdx
		; SSE2-NEXT: orq %rcx, %rdx
; SSE2-NEXT: movq %rdx, %xmm2		; SSE2-NEXT: movq %rdx, %xmm2
; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE2-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
; SSE2-NEXT: movapd %xmm0, %xmm4
; SSE2-NEXT: subsd %xmm3, %xmm4
; SSE2-NEXT: cvttsd2si %xmm4, %rcx
; SSE2-NEXT: xorq %rax, %rcx
; SSE2-NEXT: cvttsd2si %xmm0, %rax		; SSE2-NEXT: cvttsd2si %xmm0, %rax
; SSE2-NEXT: ucomisd %xmm3, %xmm0		; SSE2-NEXT: subsd %xmm3, %xmm0
; SSE2-NEXT: cmovaeq %rcx, %rax		; SSE2-NEXT: cvttsd2si %xmm0, %rcx
; SSE2-NEXT: movq %rax, %xmm0		; SSE2-NEXT: movq %rax, %rdx
		; SSE2-NEXT: sarq $63, %rdx
		; SSE2-NEXT: andq %rcx, %rdx
		; SSE2-NEXT: orq %rax, %rdx
		; SSE2-NEXT: movq %rdx, %xmm0
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]		; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [4294967295,4294967295]		; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [4294967295,4294967295]
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: pand %xmm0, %xmm3		; SSE2-NEXT: pand %xmm0, %xmm3
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [4841369599423283200,4841369599423283200]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [4841369599423283200,4841369599423283200]
; SSE2-NEXT: por %xmm4, %xmm3		; SSE2-NEXT: por %xmm4, %xmm3
; SSE2-NEXT: psrlq $32, %xmm2		; SSE2-NEXT: psrlq $32, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [4985484787499139072,4985484787499139072]		; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [4985484787499139072,4985484787499139072]
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/half.ll

	Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	}			}

	define i64 @test_fptoui_i64(half* %p) #0 {			define i64 @test_fptoui_i64(half* %p) #0 {
	; CHECK-LIBCALL-LABEL: test_fptoui_i64:			; CHECK-LIBCALL-LABEL: test_fptoui_i64:
	; CHECK-LIBCALL: # %bb.0:			; CHECK-LIBCALL: # %bb.0:
	; CHECK-LIBCALL-NEXT: pushq %rax			; CHECK-LIBCALL-NEXT: pushq %rax
	; CHECK-LIBCALL-NEXT: movzwl (%rdi), %edi			; CHECK-LIBCALL-NEXT: movzwl (%rdi), %edi
	; CHECK-LIBCALL-NEXT: callq __gnu_h2f_ieee@PLT			; CHECK-LIBCALL-NEXT: callq __gnu_h2f_ieee@PLT
	; CHECK-LIBCALL-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-LIBCALL-NEXT: cvttss2si %xmm0, %rcx
	; CHECK-LIBCALL-NEXT: movaps %xmm0, %xmm2			; CHECK-LIBCALL-NEXT: movq %rcx, %rdx
	; CHECK-LIBCALL-NEXT: subss %xmm1, %xmm2			; CHECK-LIBCALL-NEXT: sarq $63, %rdx
	; CHECK-LIBCALL-NEXT: cvttss2si %xmm2, %rax			; CHECK-LIBCALL-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; CHECK-LIBCALL-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; CHECK-LIBCALL-NEXT: xorq %rax, %rcx
	; CHECK-LIBCALL-NEXT: cvttss2si %xmm0, %rax			; CHECK-LIBCALL-NEXT: cvttss2si %xmm0, %rax
	; CHECK-LIBCALL-NEXT: ucomiss %xmm1, %xmm0			; CHECK-LIBCALL-NEXT: andq %rdx, %rax
	; CHECK-LIBCALL-NEXT: cmovaeq %rcx, %rax			; CHECK-LIBCALL-NEXT: orq %rcx, %rax
	; CHECK-LIBCALL-NEXT: popq %rcx			; CHECK-LIBCALL-NEXT: popq %rcx
	; CHECK-LIBCALL-NEXT: retq			; CHECK-LIBCALL-NEXT: retq
	;			;
	; BWON-F16C-LABEL: test_fptoui_i64:			; BWON-F16C-LABEL: test_fptoui_i64:
	; BWON-F16C: # %bb.0:			; BWON-F16C: # %bb.0:
	; BWON-F16C-NEXT: movzwl (%rdi), %eax			; BWON-F16C-NEXT: movzwl (%rdi), %eax
	; BWON-F16C-NEXT: vmovd %eax, %xmm0			; BWON-F16C-NEXT: vmovd %eax, %xmm0
	; BWON-F16C-NEXT: vcvtph2ps %xmm0, %xmm0			; BWON-F16C-NEXT: vcvtph2ps %xmm0, %xmm0
	; BWON-F16C-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; BWON-F16C-NEXT: vcvttss2si %xmm0, %rcx
	; BWON-F16C-NEXT: vsubss %xmm1, %xmm0, %xmm2			; BWON-F16C-NEXT: movq %rcx, %rdx
	; BWON-F16C-NEXT: vcvttss2si %xmm2, %rax			; BWON-F16C-NEXT: sarq $63, %rdx
	; BWON-F16C-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000			; BWON-F16C-NEXT: vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; BWON-F16C-NEXT: xorq %rax, %rcx
	; BWON-F16C-NEXT: vcvttss2si %xmm0, %rax			; BWON-F16C-NEXT: vcvttss2si %xmm0, %rax
	; BWON-F16C-NEXT: vucomiss %xmm1, %xmm0			; BWON-F16C-NEXT: andq %rdx, %rax
	; BWON-F16C-NEXT: cmovaeq %rcx, %rax			; BWON-F16C-NEXT: orq %rcx, %rax
	; BWON-F16C-NEXT: retq			; BWON-F16C-NEXT: retq
	;			;
	; CHECK-I686-LABEL: test_fptoui_i64:			; CHECK-I686-LABEL: test_fptoui_i64:
	; CHECK-I686: # %bb.0:			; CHECK-I686: # %bb.0:
	; CHECK-I686-NEXT: subl $12, %esp			; CHECK-I686-NEXT: subl $12, %esp
	; CHECK-I686-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-I686-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-I686-NEXT: movzwl (%eax), %eax			; CHECK-I686-NEXT: movzwl (%eax), %eax
	; CHECK-I686-NEXT: movl %eax, (%esp)			; CHECK-I686-NEXT: movl %eax, (%esp)
	▲ Show 20 Lines • Show All 666 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/scalar-fp-to-i32.ll

	Show All 32 Lines
	; X86-AVX512-NEXT: vcvttss2usi {{[0-9]+}}(%esp), %eax			; X86-AVX512-NEXT: vcvttss2usi {{[0-9]+}}(%esp), %eax
	; X86-AVX512-NEXT: retl			; X86-AVX512-NEXT: retl
	;			;
	; X64-AVX512-LABEL: f_to_u32:			; X64-AVX512-LABEL: f_to_u32:
	; X64-AVX512: # %bb.0:			; X64-AVX512: # %bb.0:
	; X64-AVX512-NEXT: vcvttss2usi %xmm0, %eax			; X64-AVX512-NEXT: vcvttss2usi %xmm0, %eax
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	;			;
	; X86-SSE3-WIN-LABEL: f_to_u32:			; X86-SSE-WIN-LABEL: f_to_u32:
	; X86-SSE3-WIN: # %bb.0:			; X86-SSE-WIN: # %bb.0:
	; X86-SSE3-WIN-NEXT: pushl %ebp			; X86-SSE-WIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE3-WIN-NEXT: movl %esp, %ebp			; X86-SSE-WIN-NEXT: cvttss2si %xmm0, %ecx
	; X86-SSE3-WIN-NEXT: andl $-8, %esp			; X86-SSE-WIN-NEXT: movl %ecx, %edx
	; X86-SSE3-WIN-NEXT: subl $8, %esp			; X86-SSE-WIN-NEXT: sarl $31, %edx
	; X86-SSE3-WIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE-WIN-NEXT: subss __real@4f000000, %xmm0
	; X86-SSE3-WIN-NEXT: movss %xmm0, (%esp)			; X86-SSE-WIN-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE3-WIN-NEXT: flds (%esp)			; X86-SSE-WIN-NEXT: andl %edx, %eax
	; X86-SSE3-WIN-NEXT: fisttpll (%esp)			; X86-SSE-WIN-NEXT: orl %ecx, %eax
	; X86-SSE3-WIN-NEXT: movl (%esp), %eax			; X86-SSE-WIN-NEXT: retl
	; X86-SSE3-WIN-NEXT: movl %ebp, %esp
	; X86-SSE3-WIN-NEXT: popl %ebp
	; X86-SSE3-WIN-NEXT: retl
	;			;
	; X86-SSE3-LIN-LABEL: f_to_u32:			; X86-SSE-LIN-LABEL: f_to_u32:
	; X86-SSE3-LIN: # %bb.0:			; X86-SSE-LIN: # %bb.0:
	; X86-SSE3-LIN-NEXT: subl $12, %esp			; X86-SSE-LIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE3-LIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE-LIN-NEXT: cvttss2si %xmm0, %ecx
	; X86-SSE3-LIN-NEXT: movss %xmm0, (%esp)			; X86-SSE-LIN-NEXT: movl %ecx, %edx
	; X86-SSE3-LIN-NEXT: flds (%esp)			; X86-SSE-LIN-NEXT: sarl $31, %edx
	; X86-SSE3-LIN-NEXT: fisttpll (%esp)			; X86-SSE-LIN-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE3-LIN-NEXT: movl (%esp), %eax			; X86-SSE-LIN-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE3-LIN-NEXT: addl $12, %esp			; X86-SSE-LIN-NEXT: andl %edx, %eax
	; X86-SSE3-LIN-NEXT: retl			; X86-SSE-LIN-NEXT: orl %ecx, %eax
				; X86-SSE-LIN-NEXT: retl
	;			;
	; X64-SSE-LABEL: f_to_u32:			; X64-SSE-LABEL: f_to_u32:
	; X64-SSE: # %bb.0:			; X64-SSE: # %bb.0:
	; X64-SSE-NEXT: cvttss2si %xmm0, %rax			; X64-SSE-NEXT: cvttss2si %xmm0, %rax
	; X64-SSE-NEXT: # kill: def $eax killed $eax killed $rax			; X64-SSE-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-SSE-NEXT: retq			; X64-SSE-NEXT: retq
	;			;
	; X86-SSE2-LABEL: f_to_u32:
	; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X86-SSE2-NEXT: movaps %xmm0, %xmm2
	; X86-SSE2-NEXT: subss %xmm1, %xmm2
	; X86-SSE2-NEXT: cvttss2si %xmm2, %ecx
	; X86-SSE2-NEXT: xorl $-2147483648, %ecx # imm = 0x80000000
	; X86-SSE2-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE2-NEXT: ucomiss %xmm0, %xmm1
	; X86-SSE2-NEXT: cmovbel %ecx, %eax
	; X86-SSE2-NEXT: retl
	;
	; X86-SSE1-LABEL: f_to_u32:
	; X86-SSE1: # %bb.0:
	; X86-SSE1-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE1-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X86-SSE1-NEXT: movaps %xmm0, %xmm2
	; X86-SSE1-NEXT: subss %xmm1, %xmm2
	; X86-SSE1-NEXT: cvttss2si %xmm2, %ecx
	; X86-SSE1-NEXT: xorl $-2147483648, %ecx # imm = 0x80000000
	; X86-SSE1-NEXT: cvttss2si %xmm0, %eax
	; X86-SSE1-NEXT: ucomiss %xmm0, %xmm1
	; X86-SSE1-NEXT: cmovbel %ecx, %eax
	; X86-SSE1-NEXT: retl
	;
	; X87-WIN-LABEL: f_to_u32:			; X87-WIN-LABEL: f_to_u32:
	; X87-WIN: # %bb.0:			; X87-WIN: # %bb.0:
	; X87-WIN-NEXT: pushl %ebp			; X87-WIN-NEXT: pushl %ebp
	; X87-WIN-NEXT: movl %esp, %ebp			; X87-WIN-NEXT: movl %esp, %ebp
	; X87-WIN-NEXT: andl $-8, %esp			; X87-WIN-NEXT: andl $-8, %esp
	; X87-WIN-NEXT: subl $16, %esp			; X87-WIN-NEXT: subl $16, %esp
	; X87-WIN-NEXT: flds 8(%ebp)			; X87-WIN-NEXT: flds 8(%ebp)
	; X87-WIN-NEXT: fnstcw {{[0-9]+}}(%esp)			; X87-WIN-NEXT: fnstcw {{[0-9]+}}(%esp)
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	;			;
	; X64-AVX512-LABEL: d_to_u32:			; X64-AVX512-LABEL: d_to_u32:
	; X64-AVX512: # %bb.0:			; X64-AVX512: # %bb.0:
	; X64-AVX512-NEXT: vcvttsd2usi %xmm0, %eax			; X64-AVX512-NEXT: vcvttsd2usi %xmm0, %eax
	; X64-AVX512-NEXT: retq			; X64-AVX512-NEXT: retq
	;			;
	; X86-SSE3-WIN-LABEL: d_to_u32:			; X86-SSE3-WIN-LABEL: d_to_u32:
	; X86-SSE3-WIN: # %bb.0:			; X86-SSE3-WIN: # %bb.0:
	; X86-SSE3-WIN-NEXT: pushl %ebp
	; X86-SSE3-WIN-NEXT: movl %esp, %ebp
	; X86-SSE3-WIN-NEXT: andl $-8, %esp
	; X86-SSE3-WIN-NEXT: subl $8, %esp
	; X86-SSE3-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE3-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-SSE3-WIN-NEXT: movsd %xmm0, (%esp)			; X86-SSE3-WIN-NEXT: cvttsd2si %xmm0, %ecx
	; X86-SSE3-WIN-NEXT: fldl (%esp)			; X86-SSE3-WIN-NEXT: movl %ecx, %edx
	; X86-SSE3-WIN-NEXT: fisttpll (%esp)			; X86-SSE3-WIN-NEXT: sarl $31, %edx
	; X86-SSE3-WIN-NEXT: movl (%esp), %eax			; X86-SSE3-WIN-NEXT: subsd __real@41e0000000000000, %xmm0
	; X86-SSE3-WIN-NEXT: movl %ebp, %esp			; X86-SSE3-WIN-NEXT: cvttsd2si %xmm0, %eax
	; X86-SSE3-WIN-NEXT: popl %ebp			; X86-SSE3-WIN-NEXT: andl %edx, %eax
				; X86-SSE3-WIN-NEXT: orl %ecx, %eax
	; X86-SSE3-WIN-NEXT: retl			; X86-SSE3-WIN-NEXT: retl
	;			;
	; X86-SSE3-LIN-LABEL: d_to_u32:			; X86-SSE3-LIN-LABEL: d_to_u32:
	; X86-SSE3-LIN: # %bb.0:			; X86-SSE3-LIN: # %bb.0:
	; X86-SSE3-LIN-NEXT: subl $12, %esp
	; X86-SSE3-LIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE3-LIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-SSE3-LIN-NEXT: movsd %xmm0, (%esp)			; X86-SSE3-LIN-NEXT: cvttsd2si %xmm0, %ecx
	; X86-SSE3-LIN-NEXT: fldl (%esp)			; X86-SSE3-LIN-NEXT: movl %ecx, %edx
	; X86-SSE3-LIN-NEXT: fisttpll (%esp)			; X86-SSE3-LIN-NEXT: sarl $31, %edx
	; X86-SSE3-LIN-NEXT: movl (%esp), %eax			; X86-SSE3-LIN-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE3-LIN-NEXT: addl $12, %esp			; X86-SSE3-LIN-NEXT: cvttsd2si %xmm0, %eax
				; X86-SSE3-LIN-NEXT: andl %edx, %eax
				; X86-SSE3-LIN-NEXT: orl %ecx, %eax
	; X86-SSE3-LIN-NEXT: retl			; X86-SSE3-LIN-NEXT: retl
	;			;
	; X64-SSE-LABEL: d_to_u32:			; X64-SSE-LABEL: d_to_u32:
	; X64-SSE: # %bb.0:			; X64-SSE: # %bb.0:
	; X64-SSE-NEXT: cvttsd2si %xmm0, %rax			; X64-SSE-NEXT: cvttsd2si %xmm0, %rax
	; X64-SSE-NEXT: # kill: def $eax killed $eax killed $rax			; X64-SSE-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-SSE-NEXT: retq			; X64-SSE-NEXT: retq
	;			;
	; X86-SSE2-LABEL: d_to_u32:			; X86-SSE2-WIN-LABEL: d_to_u32:
	; X86-SSE2: # %bb.0:			; X86-SSE2-WIN: # %bb.0:
	; X86-SSE2-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE2-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X86-SSE2-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X86-SSE2-WIN-NEXT: cvttsd2si %xmm0, %ecx
	; X86-SSE2-NEXT: movapd %xmm0, %xmm2			; X86-SSE2-WIN-NEXT: movl %ecx, %edx
	; X86-SSE2-NEXT: subsd %xmm1, %xmm2			; X86-SSE2-WIN-NEXT: sarl $31, %edx
	; X86-SSE2-NEXT: cvttsd2si %xmm2, %ecx			; X86-SSE2-WIN-NEXT: subsd __real@41e0000000000000, %xmm0
	; X86-SSE2-NEXT: xorl $-2147483648, %ecx # imm = 0x80000000			; X86-SSE2-WIN-NEXT: cvttsd2si %xmm0, %eax
	; X86-SSE2-NEXT: cvttsd2si %xmm0, %eax			; X86-SSE2-WIN-NEXT: andl %edx, %eax
	; X86-SSE2-NEXT: ucomisd %xmm0, %xmm1			; X86-SSE2-WIN-NEXT: orl %ecx, %eax
	; X86-SSE2-NEXT: cmovbel %ecx, %eax			; X86-SSE2-WIN-NEXT: retl
	; X86-SSE2-NEXT: retl			;
				; X86-SSE2-LIN-LABEL: d_to_u32:
				; X86-SSE2-LIN: # %bb.0:
				; X86-SSE2-LIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE2-LIN-NEXT: cvttsd2si %xmm0, %ecx
				; X86-SSE2-LIN-NEXT: movl %ecx, %edx
				; X86-SSE2-LIN-NEXT: sarl $31, %edx
				; X86-SSE2-LIN-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
				; X86-SSE2-LIN-NEXT: cvttsd2si %xmm0, %eax
				; X86-SSE2-LIN-NEXT: andl %edx, %eax
				; X86-SSE2-LIN-NEXT: orl %ecx, %eax
				; X86-SSE2-LIN-NEXT: retl
	;			;
	; X86-SSE1-WIN-LABEL: d_to_u32:			; X86-SSE1-WIN-LABEL: d_to_u32:
	; X86-SSE1-WIN: # %bb.0:			; X86-SSE1-WIN: # %bb.0:
	; X86-SSE1-WIN-NEXT: pushl %ebp			; X86-SSE1-WIN-NEXT: pushl %ebp
	; X86-SSE1-WIN-NEXT: movl %esp, %ebp			; X86-SSE1-WIN-NEXT: movl %esp, %ebp
	; X86-SSE1-WIN-NEXT: andl $-8, %esp			; X86-SSE1-WIN-NEXT: andl $-8, %esp
	; X86-SSE1-WIN-NEXT: subl $16, %esp			; X86-SSE1-WIN-NEXT: subl $16, %esp
	; X86-SSE1-WIN-NEXT: fldl 8(%ebp)			; X86-SSE1-WIN-NEXT: fldl 8(%ebp)
	▲ Show 20 Lines • Show All 643 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/scalar-fp-to-i64.ll

	Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
	; X86-SSE3-LIN-NEXT: setbe %al			; X86-SSE3-LIN-NEXT: setbe %al
	; X86-SSE3-LIN-NEXT: movzbl %al, %edx			; X86-SSE3-LIN-NEXT: movzbl %al, %edx
	; X86-SSE3-LIN-NEXT: shll $31, %edx			; X86-SSE3-LIN-NEXT: shll $31, %edx
	; X86-SSE3-LIN-NEXT: xorl {{[0-9]+}}(%esp), %edx			; X86-SSE3-LIN-NEXT: xorl {{[0-9]+}}(%esp), %edx
	; X86-SSE3-LIN-NEXT: movl (%esp), %eax			; X86-SSE3-LIN-NEXT: movl (%esp), %eax
	; X86-SSE3-LIN-NEXT: addl $12, %esp			; X86-SSE3-LIN-NEXT: addl $12, %esp
	; X86-SSE3-LIN-NEXT: retl			; X86-SSE3-LIN-NEXT: retl
	;			;
	; X64-SSE-LABEL: f_to_u64:			; X64-SSE-WIN-LABEL: f_to_u64:
	; X64-SSE: # %bb.0:			; X64-SSE-WIN: # %bb.0:
	; X64-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X64-SSE-WIN-NEXT: cvttss2si %xmm0, %rcx
	; X64-SSE-NEXT: movaps %xmm0, %xmm2			; X64-SSE-WIN-NEXT: movq %rcx, %rdx
	; X64-SSE-NEXT: subss %xmm1, %xmm2			; X64-SSE-WIN-NEXT: sarq $63, %rdx
	; X64-SSE-NEXT: cvttss2si %xmm2, %rax			; X64-SSE-WIN-NEXT: subss __real@5f000000(%rip), %xmm0
	; X64-SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000			; X64-SSE-WIN-NEXT: cvttss2si %xmm0, %rax
	; X64-SSE-NEXT: xorq %rax, %rcx			; X64-SSE-WIN-NEXT: andq %rdx, %rax
	; X64-SSE-NEXT: cvttss2si %xmm0, %rax			; X64-SSE-WIN-NEXT: orq %rcx, %rax
	; X64-SSE-NEXT: ucomiss %xmm1, %xmm0			; X64-SSE-WIN-NEXT: retq
	; X64-SSE-NEXT: cmovaeq %rcx, %rax			;
	; X64-SSE-NEXT: retq			; X64-SSE-LIN-LABEL: f_to_u64:
				; X64-SSE-LIN: # %bb.0:
				; X64-SSE-LIN-NEXT: cvttss2si %xmm0, %rcx
				; X64-SSE-LIN-NEXT: movq %rcx, %rdx
				; X64-SSE-LIN-NEXT: sarq $63, %rdx
				; X64-SSE-LIN-NEXT: subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; X64-SSE-LIN-NEXT: cvttss2si %xmm0, %rax
				; X64-SSE-LIN-NEXT: andq %rdx, %rax
				; X64-SSE-LIN-NEXT: orq %rcx, %rax
				; X64-SSE-LIN-NEXT: retq
	;			;
	; X86-SSE2-WIN-LABEL: f_to_u64:			; X86-SSE2-WIN-LABEL: f_to_u64:
	; X86-SSE2-WIN: # %bb.0:			; X86-SSE2-WIN: # %bb.0:
	; X86-SSE2-WIN-NEXT: pushl %ebp			; X86-SSE2-WIN-NEXT: pushl %ebp
	; X86-SSE2-WIN-NEXT: movl %esp, %ebp			; X86-SSE2-WIN-NEXT: movl %esp, %ebp
	; X86-SSE2-WIN-NEXT: andl $-8, %esp			; X86-SSE2-WIN-NEXT: andl $-8, %esp
	; X86-SSE2-WIN-NEXT: subl $16, %esp			; X86-SSE2-WIN-NEXT: subl $16, %esp
	; X86-SSE2-WIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE2-WIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	▲ Show 20 Lines • Show All 396 Lines • ▼ Show 20 Lines
	; X86-SSE3-LIN-NEXT: setbe %al			; X86-SSE3-LIN-NEXT: setbe %al
	; X86-SSE3-LIN-NEXT: movzbl %al, %edx			; X86-SSE3-LIN-NEXT: movzbl %al, %edx
	; X86-SSE3-LIN-NEXT: shll $31, %edx			; X86-SSE3-LIN-NEXT: shll $31, %edx
	; X86-SSE3-LIN-NEXT: xorl {{[0-9]+}}(%esp), %edx			; X86-SSE3-LIN-NEXT: xorl {{[0-9]+}}(%esp), %edx
	; X86-SSE3-LIN-NEXT: movl (%esp), %eax			; X86-SSE3-LIN-NEXT: movl (%esp), %eax
	; X86-SSE3-LIN-NEXT: addl $12, %esp			; X86-SSE3-LIN-NEXT: addl $12, %esp
	; X86-SSE3-LIN-NEXT: retl			; X86-SSE3-LIN-NEXT: retl
	;			;
	; X64-SSE-LABEL: d_to_u64:			; X64-SSE-WIN-LABEL: d_to_u64:
	; X64-SSE: # %bb.0:			; X64-SSE-WIN: # %bb.0:
	; X64-SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X64-SSE-WIN-NEXT: cvttsd2si %xmm0, %rcx
	; X64-SSE-NEXT: movapd %xmm0, %xmm2			; X64-SSE-WIN-NEXT: movq %rcx, %rdx
	; X64-SSE-NEXT: subsd %xmm1, %xmm2			; X64-SSE-WIN-NEXT: sarq $63, %rdx
	; X64-SSE-NEXT: cvttsd2si %xmm2, %rax			; X64-SSE-WIN-NEXT: subsd __real@43e0000000000000(%rip), %xmm0
	; X64-SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000			; X64-SSE-WIN-NEXT: cvttsd2si %xmm0, %rax
	; X64-SSE-NEXT: xorq %rax, %rcx			; X64-SSE-WIN-NEXT: andq %rdx, %rax
	; X64-SSE-NEXT: cvttsd2si %xmm0, %rax			; X64-SSE-WIN-NEXT: orq %rcx, %rax
	; X64-SSE-NEXT: ucomisd %xmm1, %xmm0			; X64-SSE-WIN-NEXT: retq
	; X64-SSE-NEXT: cmovaeq %rcx, %rax			;
	; X64-SSE-NEXT: retq			; X64-SSE-LIN-LABEL: d_to_u64:
				; X64-SSE-LIN: # %bb.0:
				; X64-SSE-LIN-NEXT: cvttsd2si %xmm0, %rcx
				; X64-SSE-LIN-NEXT: movq %rcx, %rdx
				; X64-SSE-LIN-NEXT: sarq $63, %rdx
				; X64-SSE-LIN-NEXT: subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; X64-SSE-LIN-NEXT: cvttsd2si %xmm0, %rax
				; X64-SSE-LIN-NEXT: andq %rdx, %rax
				; X64-SSE-LIN-NEXT: orq %rcx, %rax
				; X64-SSE-LIN-NEXT: retq
	;			;
	; X86-SSE2-WIN-LABEL: d_to_u64:			; X86-SSE2-WIN-LABEL: d_to_u64:
	; X86-SSE2-WIN: # %bb.0:			; X86-SSE2-WIN: # %bb.0:
	; X86-SSE2-WIN-NEXT: pushl %ebp			; X86-SSE2-WIN-NEXT: pushl %ebp
	; X86-SSE2-WIN-NEXT: movl %esp, %ebp			; X86-SSE2-WIN-NEXT: movl %esp, %ebp
	; X86-SSE2-WIN-NEXT: andl $-8, %esp			; X86-SSE2-WIN-NEXT: andl $-8, %esp
	; X86-SSE2-WIN-NEXT: subl $16, %esp			; X86-SSE2-WIN-NEXT: subl $16, %esp
	; X86-SSE2-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE2-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	▲ Show 20 Lines • Show All 982 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_cast3.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%res = fptoui <2 x float> %src to <2 x i16>			%res = fptoui <2 x float> %src to <2 x i16>
	ret <2 x i16> %res			ret <2 x i16> %res
	}			}

	define <2 x i32> @cvt_v2f32_v2u32(<2 x float> %src) {			define <2 x i32> @cvt_v2f32_v2u32(<2 x float> %src) {
	; CHECK-LABEL: cvt_v2f32_v2u32:			; CHECK-LABEL: cvt_v2f32_v2u32:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: vmovaps {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]			; CHECK-NEXT: vcvttps2dq %xmm0, %xmm1
	; CHECK-NEXT: vcmpltps %xmm1, %xmm0, %xmm2			; CHECK-NEXT: vpsrad $31, %xmm1, %xmm2
	; CHECK-NEXT: vsubps %xmm1, %xmm0, %xmm1			; CHECK-NEXT: vsubps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
	; CHECK-NEXT: vcvttps2dq %xmm1, %xmm1
	; CHECK-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm1
	; CHECK-NEXT: vcvttps2dq %xmm0, %xmm0			; CHECK-NEXT: vcvttps2dq %xmm0, %xmm0
	; CHECK-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-NEXT: vpand %xmm2, %xmm0, %xmm0
				; CHECK-NEXT: vpor %xmm0, %xmm1, %xmm0
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%res = fptoui <2 x float> %src to <2 x i32>			%res = fptoui <2 x float> %src to <2 x i32>
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

	define <32 x i8> @PR40146(<4 x i64> %x) {			define <32 x i8> @PR40146(<4 x i64> %x) {
	; CHECK-LABEL: PR40146:			; CHECK-LABEL: PR40146:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	Show All 11 Lines

llvm/test/CodeGen/X86/vec_fp_to_int.ll

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines

define <2 x i64> @fptoui_2f64_to_2i64(<2 x double> %a) {		define <2 x i64> @fptoui_2f64_to_2i64(<2 x double> %a) {
; SSE-LABEL: fptoui_2f64_to_2i64:		; SSE-LABEL: fptoui_2f64_to_2i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
; SSE-NEXT: movapd %xmm0, %xmm1		; SSE-NEXT: movapd %xmm0, %xmm1
; SSE-NEXT: subsd %xmm2, %xmm1		; SSE-NEXT: subsd %xmm2, %xmm1
; SSE-NEXT: cvttsd2si %xmm1, %rax		; SSE-NEXT: cvttsd2si %xmm1, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; SSE-NEXT: cvttsd2si %xmm0, %rcx
; SSE-NEXT: xorq %rcx, %rax		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttsd2si %xmm0, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomisd %xmm2, %xmm0		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm1		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
; SSE-NEXT: movapd %xmm0, %xmm3		; SSE-NEXT: cvttsd2si %xmm0, %rax
; SSE-NEXT: subsd %xmm2, %xmm3		; SSE-NEXT: subsd %xmm2, %xmm0
; SSE-NEXT: cvttsd2si %xmm3, %rax
; SSE-NEXT: xorq %rcx, %rax
; SSE-NEXT: cvttsd2si %xmm0, %rcx		; SSE-NEXT: cvttsd2si %xmm0, %rcx
; SSE-NEXT: ucomisd %xmm2, %xmm0		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rcx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: movq %rcx, %xmm0		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm0
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE-NEXT: movdqa %xmm1, %xmm0		; SSE-NEXT: movdqa %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: fptoui_2f64_to_2i64:		; VEX-LABEL: fptoui_2f64_to_2i64:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero		; VEX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; VEX-NEXT: vsubsd %xmm1, %xmm0, %xmm2		; VEX-NEXT: vsubsd %xmm1, %xmm0, %xmm2
; VEX-NEXT: vcvttsd2si %xmm2, %rax		; VEX-NEXT: vcvttsd2si %xmm2, %rax
; VEX-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; VEX-NEXT: vcvttsd2si %xmm0, %rcx
; VEX-NEXT: xorq %rcx, %rax		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: vcvttsd2si %xmm0, %rdx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vucomisd %xmm1, %xmm0		; VEX-NEXT: andq %rax, %rdx
; VEX-NEXT: cmovaeq %rax, %rdx		; VEX-NEXT: orq %rcx, %rdx
; VEX-NEXT: vmovq %rdx, %xmm2		; VEX-NEXT: vmovq %rdx, %xmm2
; VEX-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]		; VEX-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
; VEX-NEXT: vsubsd %xmm1, %xmm0, %xmm3		; VEX-NEXT: vsubsd %xmm1, %xmm0, %xmm1
; VEX-NEXT: vcvttsd2si %xmm3, %rax		; VEX-NEXT: vcvttsd2si %xmm1, %rax
; VEX-NEXT: xorq %rcx, %rax
; VEX-NEXT: vcvttsd2si %xmm0, %rcx		; VEX-NEXT: vcvttsd2si %xmm0, %rcx
; VEX-NEXT: vucomisd %xmm1, %xmm0		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: cmovaeq %rax, %rcx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vmovq %rcx, %xmm0		; VEX-NEXT: andq %rax, %rdx
		; VEX-NEXT: orq %rcx, %rdx
		; VEX-NEXT: vmovq %rdx, %xmm0
; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]		; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f64_to_2i64:		; AVX512F-LABEL: fptoui_2f64_to_2i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vcvttsd2usi %xmm0, %rax		; AVX512F-NEXT: vcvttsd2usi %xmm0, %rax
; AVX512F-NEXT: vmovq %rax, %xmm1		; AVX512F-NEXT: vmovq %rax, %xmm1
; AVX512F-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]		; AVX512F-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
Show All 26 Lines
; AVX512VLDQ-NEXT: retq		; AVX512VLDQ-NEXT: retq
%cvt = fptoui <2 x double> %a to <2 x i64>		%cvt = fptoui <2 x double> %a to <2 x i64>
ret <2 x i64> %cvt		ret <2 x i64> %cvt
}		}

define <4 x i32> @fptoui_2f64_to_4i32(<2 x double> %a) {		define <4 x i32> @fptoui_2f64_to_4i32(<2 x double> %a) {
; SSE-LABEL: fptoui_2f64_to_4i32:		; SSE-LABEL: fptoui_2f64_to_4i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: cvttpd2dq %xmm0, %xmm1
; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: movapd %xmm1, %xmm2
; SSE-NEXT: cvttsd2si %xmm0, %rcx		; SSE-NEXT: psrad $31, %xmm2
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: movd %ecx, %xmm1		; SSE-NEXT: cvttpd2dq %xmm0, %xmm0
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]		; SSE-NEXT: andpd %xmm2, %xmm0
; SSE-NEXT: movq {{.*#+}} xmm0 = xmm0[0],zero		; SSE-NEXT: orpd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_2f64_to_4i32:		; VEX-LABEL: fptoui_2f64_to_4i32:
; AVX1: # %bb.0:		; VEX: # %bb.0:
; AVX1-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; VEX-NEXT: vcvttpd2dq %xmm0, %xmm1
; AVX1-NEXT: vmovapd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; VEX-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2		; VEX-NEXT: vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vpackssdw %xmm2, %xmm2, %xmm2		; VEX-NEXT: vcvttpd2dq %xmm0, %xmm0
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm3		; VEX-NEXT: vandpd %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vsubpd %ymm1, %ymm0, %ymm0		; VEX-NEXT: vorpd %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0		; VEX-NEXT: retq
; AVX1-NEXT: vxorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; AVX1-NEXT: vzeroupper
; AVX1-NEXT: retq
;
; AVX2-LABEL: fptoui_2f64_to_4i32:
; AVX2: # %bb.0:
; AVX2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
; AVX2-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2
; AVX2-NEXT: vpackssdw %xmm2, %xmm2, %xmm2
; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorpd %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; AVX2-NEXT: vzeroupper
; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f64_to_4i32:		; AVX512F-LABEL: fptoui_2f64_to_4i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0		; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0
; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero		; AVX512F-NEXT: vmovq {{.*#+}} xmm0 = xmm0[0],zero
; AVX512F-NEXT: vzeroupper		; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
Show All 18 Lines	; AVX512VLDQ-NEXT: retq
%cvt = fptoui <2 x double> %a to <2 x i32>		%cvt = fptoui <2 x double> %a to <2 x i32>
%ext = shufflevector <2 x i32> %cvt, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%ext = shufflevector <2 x i32> %cvt, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %ext		ret <4 x i32> %ext
}		}

define <4 x i32> @fptoui_2f64_to_2i32(<2 x double> %a) {		define <4 x i32> @fptoui_2f64_to_2i32(<2 x double> %a) {
; SSE-LABEL: fptoui_2f64_to_2i32:		; SSE-LABEL: fptoui_2f64_to_2i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: cvttpd2dq %xmm0, %xmm1
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movapd %xmm1, %xmm2
; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: psrad $31, %xmm2
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: cvttpd2dq %xmm0, %xmm0
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]		; SSE-NEXT: andpd %xmm2, %xmm0
; SSE-NEXT: movdqa %xmm1, %xmm0		; SSE-NEXT: orpd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_2f64_to_2i32:		; VEX-LABEL: fptoui_2f64_to_2i32:
; AVX1: # %bb.0:		; VEX: # %bb.0:
; AVX1-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; VEX-NEXT: vcvttpd2dq %xmm0, %xmm1
; AVX1-NEXT: vmovapd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; VEX-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2		; VEX-NEXT: vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3		; VEX-NEXT: vcvttpd2dq %xmm0, %xmm0
; AVX1-NEXT: vpackssdw %xmm3, %xmm2, %xmm2		; VEX-NEXT: vandpd %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm3		; VEX-NEXT: vorpd %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vsubpd %ymm1, %ymm0, %ymm0		; VEX-NEXT: retq
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX1-NEXT: vxorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vzeroupper
; AVX1-NEXT: retq
;
; AVX2-LABEL: fptoui_2f64_to_2i32:
; AVX2: # %bb.0:
; AVX2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
; AVX2-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2
; AVX2-NEXT: vextractf128 $1, %ymm2, %xmm3
; AVX2-NEXT: vpackssdw %xmm3, %xmm2, %xmm2
; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorpd %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
; AVX2-NEXT: vzeroupper
; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f64_to_2i32:		; AVX512F-LABEL: fptoui_2f64_to_2i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0		; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
; AVX512F-NEXT: vzeroupper		; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
Show All 18 Lines	; AVX512VLDQ-NEXT: retq
%cvt = fptoui <2 x double> %a to <2 x i32>		%cvt = fptoui <2 x double> %a to <2 x i32>
%ext = shufflevector <2 x i32> %cvt, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%ext = shufflevector <2 x i32> %cvt, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
ret <4 x i32> %ext		ret <4 x i32> %ext
}		}

define <4 x i32> @fptoui_4f64_to_2i32(<2 x double> %a) {		define <4 x i32> @fptoui_4f64_to_2i32(<2 x double> %a) {
; SSE-LABEL: fptoui_4f64_to_2i32:		; SSE-LABEL: fptoui_4f64_to_2i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: cvttpd2dq %xmm0, %xmm1
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movapd %xmm1, %xmm2
; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: psrad $31, %xmm2
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: cvttpd2dq %xmm0, %xmm0
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]		; SSE-NEXT: andpd %xmm2, %xmm0
; SSE-NEXT: movq {{.*#+}} xmm0 = xmm1[0],zero		; SSE-NEXT: orpd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_4f64_to_2i32:		; AVX1-LABEL: fptoui_4f64_to_2i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovapd %xmm0, %xmm0		; AVX1-NEXT: vmovapd %xmm0, %xmm0
; AVX1-NEXT: vmovapd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm1
; AVX1-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2		; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3		; AVX1-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: vpackssdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX1-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX1-NEXT: vxorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0		; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX1-NEXT: vandpd %xmm2, %xmm0, %xmm0
		; AVX1-NEXT: vorpd %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vzeroupper		; AVX1-NEXT: vzeroupper
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_4f64_to_2i32:		; AVX2-LABEL: fptoui_4f64_to_2i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vmovapd %xmm0, %xmm0		; AVX2-NEXT: vmovapd %xmm0, %xmm0
; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
; AVX2-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2
; AVX2-NEXT: vextractf128 $1, %ymm2, %xmm3
; AVX2-NEXT: vpackssdw %xmm3, %xmm2, %xmm2
; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1		; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1		; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorpd %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0		; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX2-NEXT: vpsrad $31, %xmm0, %xmm2
		; AVX2-NEXT: vandpd %xmm2, %xmm1, %xmm1
		; AVX2-NEXT: vorpd %xmm1, %xmm0, %xmm0
; AVX2-NEXT: vzeroupper		; AVX2-NEXT: vzeroupper
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f64_to_2i32:		; AVX512F-LABEL: fptoui_4f64_to_2i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vmovaps %xmm0, %xmm0		; AVX512F-NEXT: vmovaps %xmm0, %xmm0
; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0		; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
Show All 27 Lines
}		}

define <4 x i64> @fptoui_4f64_to_4i64(<4 x double> %a) {		define <4 x i64> @fptoui_4f64_to_4i64(<4 x double> %a) {
; SSE-LABEL: fptoui_4f64_to_4i64:		; SSE-LABEL: fptoui_4f64_to_4i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movapd %xmm0, %xmm2		; SSE-NEXT: movapd %xmm0, %xmm2
; SSE-NEXT: movsd {{.*#+}} xmm3 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm3 = mem[0],zero
; SSE-NEXT: subsd %xmm3, %xmm0		; SSE-NEXT: subsd %xmm3, %xmm0
; SSE-NEXT: cvttsd2si %xmm0, %rcx		; SSE-NEXT: cvttsd2si %xmm0, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE-NEXT: cvttsd2si %xmm2, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttsd2si %xmm2, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomisd %xmm3, %xmm2		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm0		; SSE-NEXT: movq %rdx, %xmm0
; SSE-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1,1]
; SSE-NEXT: movapd %xmm2, %xmm4		; SSE-NEXT: cvttsd2si %xmm2, %rax
; SSE-NEXT: subsd %xmm3, %xmm4		; SSE-NEXT: subsd %xmm3, %xmm2
; SSE-NEXT: cvttsd2si %xmm4, %rcx		; SSE-NEXT: cvttsd2si %xmm2, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cvttsd2si %xmm2, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomisd %xmm3, %xmm2		; SSE-NEXT: andq %rcx, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rax, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; SSE-NEXT: movapd %xmm1, %xmm2		; SSE-NEXT: movapd %xmm1, %xmm2
; SSE-NEXT: subsd %xmm3, %xmm2		; SSE-NEXT: subsd %xmm3, %xmm2
; SSE-NEXT: cvttsd2si %xmm2, %rcx		; SSE-NEXT: cvttsd2si %xmm2, %rax
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: cvttsd2si %xmm1, %rcx
; SSE-NEXT: cvttsd2si %xmm1, %rdx		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: ucomisd %xmm3, %xmm1		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: andq %rax, %rdx
		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]		; SSE-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]
; SSE-NEXT: movapd %xmm1, %xmm4
; SSE-NEXT: subsd %xmm3, %xmm4
; SSE-NEXT: cvttsd2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttsd2si %xmm1, %rax		; SSE-NEXT: cvttsd2si %xmm1, %rax
; SSE-NEXT: ucomisd %xmm3, %xmm1		; SSE-NEXT: subsd %xmm3, %xmm1
; SSE-NEXT: cmovaeq %rcx, %rax		; SSE-NEXT: cvttsd2si %xmm1, %rcx
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %rdx
		; SSE-NEXT: sarq $63, %rdx
		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]
; SSE-NEXT: movdqa %xmm2, %xmm1		; SSE-NEXT: movdqa %xmm2, %xmm1
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_4f64_to_4i64:		; AVX1-LABEL: fptoui_4f64_to_4i64:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
; AVX1-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero		; AVX1-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm3		; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm3
; AVX1-NEXT: vcvttsd2si %xmm3, %rax		; AVX1-NEXT: vcvttsd2si %xmm3, %rax
; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX1-NEXT: vcvttsd2si %xmm2, %rcx
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vcvttsd2si %xmm2, %rdx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vucomisd %xmm1, %xmm2		; AVX1-NEXT: andq %rax, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm2[1,0]		; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm2[1,0]
; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm4		; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm4
; AVX1-NEXT: vcvttsd2si %xmm4, %rax		; AVX1-NEXT: vcvttsd2si %xmm4, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttsd2si %xmm2, %rcx
; AVX1-NEXT: vcvttsd2si %xmm2, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomisd %xmm1, %xmm2		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm2		; AVX1-NEXT: vmovq %rdx, %xmm2
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm3		; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm3
; AVX1-NEXT: vcvttsd2si %xmm3, %rax		; AVX1-NEXT: vcvttsd2si %xmm3, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttsd2si %xmm0, %rcx
; AVX1-NEXT: vcvttsd2si %xmm0, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomisd %xmm1, %xmm0		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]		; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm4		; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm1
; AVX1-NEXT: vcvttsd2si %xmm4, %rax		; AVX1-NEXT: vcvttsd2si %xmm1, %rax
; AVX1-NEXT: xorq %rcx, %rax
; AVX1-NEXT: vcvttsd2si %xmm0, %rcx		; AVX1-NEXT: vcvttsd2si %xmm0, %rcx
; AVX1-NEXT: vucomisd %xmm1, %xmm0		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: cmovaeq %rax, %rcx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vmovq %rcx, %xmm0		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
		; AVX1-NEXT: vmovq %rdx, %xmm0
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_4f64_to_4i64:		; AVX2-LABEL: fptoui_4f64_to_4i64:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm2		; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm2
; AVX2-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero		; AVX2-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vsubsd %xmm1, %xmm2, %xmm3		; AVX2-NEXT: vsubsd %xmm1, %xmm2, %xmm3
; AVX2-NEXT: vcvttsd2si %xmm3, %rax		; AVX2-NEXT: vcvttsd2si %xmm3, %rax
; AVX2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX2-NEXT: vcvttsd2si %xmm2, %rcx
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vcvttsd2si %xmm2, %rdx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vucomisd %xmm1, %xmm2		; AVX2-NEXT: andq %rax, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vpermilpd {{.*#+}} xmm2 = xmm2[1,0]		; AVX2-NEXT: vpermilpd {{.*#+}} xmm2 = xmm2[1,0]
; AVX2-NEXT: vsubsd %xmm1, %xmm2, %xmm4		; AVX2-NEXT: vsubsd %xmm1, %xmm2, %xmm4
; AVX2-NEXT: vcvttsd2si %xmm4, %rax		; AVX2-NEXT: vcvttsd2si %xmm4, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttsd2si %xmm2, %rcx
; AVX2-NEXT: vcvttsd2si %xmm2, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomisd %xmm1, %xmm2		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm2		; AVX2-NEXT: vmovq %rdx, %xmm2
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX2-NEXT: vsubsd %xmm1, %xmm0, %xmm3		; AVX2-NEXT: vsubsd %xmm1, %xmm0, %xmm3
; AVX2-NEXT: vcvttsd2si %xmm3, %rax		; AVX2-NEXT: vcvttsd2si %xmm3, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttsd2si %xmm0, %rcx
; AVX2-NEXT: vcvttsd2si %xmm0, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomisd %xmm1, %xmm0		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]		; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
; AVX2-NEXT: vsubsd %xmm1, %xmm0, %xmm4		; AVX2-NEXT: vsubsd %xmm1, %xmm0, %xmm1
; AVX2-NEXT: vcvttsd2si %xmm4, %rax		; AVX2-NEXT: vcvttsd2si %xmm1, %rax
; AVX2-NEXT: xorq %rcx, %rax
; AVX2-NEXT: vcvttsd2si %xmm0, %rcx		; AVX2-NEXT: vcvttsd2si %xmm0, %rcx
; AVX2-NEXT: vucomisd %xmm1, %xmm0		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: cmovaeq %rax, %rcx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vmovq %rcx, %xmm0		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
		; AVX2-NEXT: vmovq %rdx, %xmm0
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0		; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f64_to_4i64:		; AVX512F-LABEL: fptoui_4f64_to_4i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vextractf128 $1, %ymm0, %xmm1		; AVX512F-NEXT: vextractf128 $1, %ymm0, %xmm1
; AVX512F-NEXT: vcvttsd2usi %xmm1, %rax		; AVX512F-NEXT: vcvttsd2usi %xmm1, %rax
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; AVX512VLDQ-NEXT: retq		; AVX512VLDQ-NEXT: retq
%cvt = fptoui <4 x double> %a to <4 x i64>		%cvt = fptoui <4 x double> %a to <4 x i64>
ret <4 x i64> %cvt		ret <4 x i64> %cvt
}		}

define <4 x i32> @fptoui_4f64_to_4i32(<4 x double> %a) {		define <4 x i32> @fptoui_4f64_to_4i32(<4 x double> %a) {
; SSE-LABEL: fptoui_4f64_to_4i32:		; SSE-LABEL: fptoui_4f64_to_4i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: cvttsd2si %xmm1, %rax		; SSE-NEXT: movapd {{.*#+}} xmm2 = [2.147483648E+9,2.147483648E+9]
; SSE-NEXT: movd %eax, %xmm2		; SSE-NEXT: cvttpd2dq %xmm1, %xmm3
; SSE-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1,1]		; SSE-NEXT: subpd %xmm2, %xmm1
; SSE-NEXT: cvttsd2si %xmm1, %rax		; SSE-NEXT: cvttpd2dq %xmm1, %xmm1
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movapd %xmm3, %xmm4
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]		; SSE-NEXT: psrad $31, %xmm4
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: pand %xmm1, %xmm4
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: por %xmm3, %xmm4
; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: cvttpd2dq %xmm0, %xmm1
; SSE-NEXT: cvttsd2si %xmm0, %rax		; SSE-NEXT: subpd %xmm2, %xmm0
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: cvttpd2dq %xmm0, %xmm2
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]		; SSE-NEXT: movapd %xmm1, %xmm0
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]		; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: movdqa %xmm1, %xmm0		; SSE-NEXT: pand %xmm2, %xmm0
		; SSE-NEXT: por %xmm1, %xmm0
		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm4[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_4f64_to_4i32:		; AVX1-LABEL: fptoui_4f64_to_4i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovapd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm1
; AVX1-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2		; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3		; AVX1-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: vpackssdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX1-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX1-NEXT: vxorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0		; AVX1-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX1-NEXT: vandpd %xmm2, %xmm0, %xmm0
		; AVX1-NEXT: vorpd %xmm0, %xmm1, %xmm0
; AVX1-NEXT: vzeroupper		; AVX1-NEXT: vzeroupper
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_4f64_to_4i32:		; AVX2-LABEL: fptoui_4f64_to_4i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]		; AVX2-NEXT: vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
; AVX2-NEXT: vcmpltpd %ymm1, %ymm0, %ymm2
; AVX2-NEXT: vextractf128 $1, %ymm2, %xmm3
; AVX2-NEXT: vpackssdw %xmm3, %xmm2, %xmm2
; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1		; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm1
; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1		; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorpd %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0		; AVX2-NEXT: vcvttpd2dq %ymm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX2-NEXT: vpsrad $31, %xmm0, %xmm2
		; AVX2-NEXT: vandpd %xmm2, %xmm1, %xmm1
		; AVX2-NEXT: vorpd %xmm1, %xmm0, %xmm0
; AVX2-NEXT: vzeroupper		; AVX2-NEXT: vzeroupper
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f64_to_4i32:		; AVX512F-LABEL: fptoui_4f64_to_4i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0		; AVX512F-NEXT: vcvttpd2udq %zmm0, %ymm0
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines

;		;
; Float to Unsigned Integer		; Float to Unsigned Integer
;		;

define <2 x i32> @fptoui_2f32_to_2i32(<2 x float> %a) {		define <2 x i32> @fptoui_2f32_to_2i32(<2 x float> %a) {
; SSE-LABEL: fptoui_2f32_to_2i32:		; SSE-LABEL: fptoui_2f32_to_2i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movaps {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; SSE-NEXT: cvttps2dq %xmm0, %xmm1
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movdqa %xmm1, %xmm2
; SSE-NEXT: cmpltps %xmm2, %xmm1		; SSE-NEXT: psrad $31, %xmm2
; SSE-NEXT: cvttps2dq %xmm0, %xmm3		; SSE-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: subps %xmm2, %xmm0
; SSE-NEXT: cvttps2dq %xmm0, %xmm0		; SSE-NEXT: cvttps2dq %xmm0, %xmm0
; SSE-NEXT: xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0		; SSE-NEXT: pand %xmm2, %xmm0
; SSE-NEXT: andps %xmm1, %xmm3		; SSE-NEXT: por %xmm1, %xmm0
; SSE-NEXT: andnps %xmm0, %xmm1
; SSE-NEXT: orps %xmm3, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_2f32_to_2i32:		; AVX1-LABEL: fptoui_2f32_to_2i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX1-NEXT: vcvttps2dq %xmm0, %xmm1
; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm2		; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vsubps %xmm1, %xmm0, %xmm1		; AVX1-NEXT: vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vcvttps2dq %xmm1, %xmm1
; AVX1-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0		; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
		; AVX1-NEXT: vpor %xmm0, %xmm1, %xmm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_2f32_to_2i32:		; AVX2-LABEL: fptoui_2f32_to_2i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX2-NEXT: vbroadcastss {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
; AVX2-NEXT: vcmpltps %xmm1, %xmm0, %xmm2
; AVX2-NEXT: vsubps %xmm1, %xmm0, %xmm1		; AVX2-NEXT: vsubps %xmm1, %xmm0, %xmm1
; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1		; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorps %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0		; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX2-NEXT: vpsrad $31, %xmm0, %xmm2
		; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1
		; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f32_to_2i32:		; AVX512F-LABEL: fptoui_2f32_to_2i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0		; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
; AVX512F-NEXT: vzeroupper		; AVX512F-NEXT: vzeroupper
Show All 18 Lines
; AVX512VLDQ-NEXT: retq		; AVX512VLDQ-NEXT: retq
%cvt = fptoui <2 x float> %a to <2 x i32>		%cvt = fptoui <2 x float> %a to <2 x i32>
ret <2 x i32> %cvt		ret <2 x i32> %cvt
}		}

define <4 x i32> @fptoui_4f32_to_4i32(<4 x float> %a) {		define <4 x i32> @fptoui_4f32_to_4i32(<4 x float> %a) {
; SSE-LABEL: fptoui_4f32_to_4i32:		; SSE-LABEL: fptoui_4f32_to_4i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movaps {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; SSE-NEXT: cvttps2dq %xmm0, %xmm1
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movdqa %xmm1, %xmm2
; SSE-NEXT: cmpltps %xmm2, %xmm1		; SSE-NEXT: psrad $31, %xmm2
; SSE-NEXT: cvttps2dq %xmm0, %xmm3		; SSE-NEXT: subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: subps %xmm2, %xmm0
; SSE-NEXT: cvttps2dq %xmm0, %xmm0		; SSE-NEXT: cvttps2dq %xmm0, %xmm0
; SSE-NEXT: xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0		; SSE-NEXT: pand %xmm2, %xmm0
; SSE-NEXT: andps %xmm1, %xmm3		; SSE-NEXT: por %xmm1, %xmm0
; SSE-NEXT: andnps %xmm0, %xmm1
; SSE-NEXT: orps %xmm3, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_4f32_to_4i32:		; AVX1-LABEL: fptoui_4f32_to_4i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX1-NEXT: vcvttps2dq %xmm0, %xmm1
; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm2		; AVX1-NEXT: vpsrad $31, %xmm1, %xmm2
; AVX1-NEXT: vsubps %xmm1, %xmm0, %xmm1		; AVX1-NEXT: vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vcvttps2dq %xmm1, %xmm1
; AVX1-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0		; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0
; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
		; AVX1-NEXT: vpor %xmm0, %xmm1, %xmm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_4f32_to_4i32:		; AVX2-LABEL: fptoui_4f32_to_4i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX2-NEXT: vbroadcastss {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
; AVX2-NEXT: vcmpltps %xmm1, %xmm0, %xmm2
; AVX2-NEXT: vsubps %xmm1, %xmm0, %xmm1		; AVX2-NEXT: vsubps %xmm1, %xmm0, %xmm1
; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1		; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1
; AVX2-NEXT: vbroadcastss {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorps %xmm3, %xmm1, %xmm1
; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0		; AVX2-NEXT: vcvttps2dq %xmm0, %xmm0
; AVX2-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0		; AVX2-NEXT: vpsrad $31, %xmm0, %xmm2
		; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm1
		; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f32_to_4i32:		; AVX512F-LABEL: fptoui_4f32_to_4i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0		; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0
; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
; AVX512F-NEXT: vzeroupper		; AVX512F-NEXT: vzeroupper
Show All 22 Lines

define <2 x i64> @fptoui_2f32_to_2i64(<4 x float> %a) {		define <2 x i64> @fptoui_2f32_to_2i64(<4 x float> %a) {
; SSE-LABEL: fptoui_2f32_to_2i64:		; SSE-LABEL: fptoui_2f32_to_2i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: subss %xmm2, %xmm1		; SSE-NEXT: subss %xmm2, %xmm1
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: xorq %rcx, %rax		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm2, %xmm0		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm1		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: subss %xmm2, %xmm3		; SSE-NEXT: subss %xmm2, %xmm0
; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: xorq %rcx, %rax
; SSE-NEXT: cvttss2si %xmm0, %rcx		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: ucomiss %xmm2, %xmm0		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rcx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: movq %rcx, %xmm0		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm0
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE-NEXT: movdqa %xmm1, %xmm0		; SSE-NEXT: movdqa %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: fptoui_2f32_to_2i64:		; VEX-LABEL: fptoui_2f32_to_2i64:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; VEX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm2		; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm2
; VEX-NEXT: vcvttss2si %xmm2, %rax		; VEX-NEXT: vcvttss2si %xmm2, %rax
; VEX-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; VEX-NEXT: vcvttss2si %xmm0, %rcx
; VEX-NEXT: xorq %rcx, %rax		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: vcvttss2si %xmm0, %rdx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vucomiss %xmm1, %xmm0		; VEX-NEXT: andq %rax, %rdx
; VEX-NEXT: cmovaeq %rax, %rdx		; VEX-NEXT: orq %rcx, %rdx
; VEX-NEXT: vmovq %rdx, %xmm2		; VEX-NEXT: vmovq %rdx, %xmm2
; VEX-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; VEX-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm3		; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm1
; VEX-NEXT: vcvttss2si %xmm3, %rax		; VEX-NEXT: vcvttss2si %xmm1, %rax
; VEX-NEXT: xorq %rcx, %rax
; VEX-NEXT: vcvttss2si %xmm0, %rcx		; VEX-NEXT: vcvttss2si %xmm0, %rcx
; VEX-NEXT: vucomiss %xmm1, %xmm0		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: cmovaeq %rax, %rcx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vmovq %rcx, %xmm0		; VEX-NEXT: andq %rax, %rdx
		; VEX-NEXT: orq %rcx, %rdx
		; VEX-NEXT: vmovq %rdx, %xmm0
; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]		; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f32_to_2i64:		; AVX512F-LABEL: fptoui_2f32_to_2i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vcvttss2usi %xmm0, %rax		; AVX512F-NEXT: vcvttss2usi %xmm0, %rax
; AVX512F-NEXT: vmovq %rax, %xmm1		; AVX512F-NEXT: vmovq %rax, %xmm1
; AVX512F-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; AVX512F-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
Show All 31 Lines

define <2 x i64> @fptoui_4f32_to_2i64(<4 x float> %a) {		define <2 x i64> @fptoui_4f32_to_2i64(<4 x float> %a) {
; SSE-LABEL: fptoui_4f32_to_2i64:		; SSE-LABEL: fptoui_4f32_to_2i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: subss %xmm2, %xmm1		; SSE-NEXT: subss %xmm2, %xmm1
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: xorq %rcx, %rax		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm2, %xmm0		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm1		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: subss %xmm2, %xmm3		; SSE-NEXT: subss %xmm2, %xmm0
; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: xorq %rcx, %rax
; SSE-NEXT: cvttss2si %xmm0, %rcx		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: ucomiss %xmm2, %xmm0		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rcx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: movq %rcx, %xmm0		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm0
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE-NEXT: movdqa %xmm1, %xmm0		; SSE-NEXT: movdqa %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: fptoui_4f32_to_2i64:		; VEX-LABEL: fptoui_4f32_to_2i64:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]		; VEX-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; VEX-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; VEX-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; VEX-NEXT: vsubss %xmm2, %xmm1, %xmm3		; VEX-NEXT: vsubss %xmm2, %xmm1, %xmm3
; VEX-NEXT: vcvttss2si %xmm3, %rax		; VEX-NEXT: vcvttss2si %xmm3, %rax
; VEX-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; VEX-NEXT: vcvttss2si %xmm1, %rcx
; VEX-NEXT: xorq %rcx, %rax		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: vcvttss2si %xmm1, %rdx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vucomiss %xmm2, %xmm1		; VEX-NEXT: andq %rax, %rdx
; VEX-NEXT: cmovaeq %rax, %rdx		; VEX-NEXT: orq %rcx, %rdx
; VEX-NEXT: vsubss %xmm2, %xmm0, %xmm1		; VEX-NEXT: vsubss %xmm2, %xmm0, %xmm1
; VEX-NEXT: vcvttss2si %xmm1, %rax		; VEX-NEXT: vcvttss2si %xmm1, %rax
; VEX-NEXT: xorq %rcx, %rax
; VEX-NEXT: vcvttss2si %xmm0, %rcx		; VEX-NEXT: vcvttss2si %xmm0, %rcx
; VEX-NEXT: vucomiss %xmm2, %xmm0		; VEX-NEXT: movq %rcx, %rsi
; VEX-NEXT: cmovaeq %rax, %rcx		; VEX-NEXT: sarq $63, %rsi
; VEX-NEXT: vmovq %rcx, %xmm0		; VEX-NEXT: andq %rax, %rsi
		; VEX-NEXT: orq %rcx, %rsi
		; VEX-NEXT: vmovq %rsi, %xmm0
; VEX-NEXT: vmovq %rdx, %xmm1		; VEX-NEXT: vmovq %rdx, %xmm1
; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f32_to_2i64:		; AVX512F-LABEL: fptoui_4f32_to_2i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]		; AVX512F-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; AVX512F-NEXT: vcvttss2usi %xmm1, %rax		; AVX512F-NEXT: vcvttss2usi %xmm1, %rax
Show All 30 Lines	; AVX512VLDQ-NEXT: retq
%cvt = fptoui <4 x float> %a to <4 x i64>		%cvt = fptoui <4 x float> %a to <4 x i64>
%shuf = shufflevector <4 x i64> %cvt, <4 x i64> undef, <2 x i32> <i32 0, i32 1>		%shuf = shufflevector <4 x i64> %cvt, <4 x i64> undef, <2 x i32> <i32 0, i32 1>
ret <2 x i64> %shuf		ret <2 x i64> %shuf
}		}

define <8 x i32> @fptoui_8f32_to_8i32(<8 x float> %a) {		define <8 x i32> @fptoui_8f32_to_8i32(<8 x float> %a) {
; SSE-LABEL: fptoui_8f32_to_8i32:		; SSE-LABEL: fptoui_8f32_to_8i32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movaps {{.*#+}} xmm4 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; SSE-NEXT: movaps {{.*#+}} xmm2 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: cmpltps %xmm4, %xmm2
; SSE-NEXT: cvttps2dq %xmm0, %xmm3		; SSE-NEXT: cvttps2dq %xmm0, %xmm3
; SSE-NEXT: subps %xmm4, %xmm0		; SSE-NEXT: subps %xmm2, %xmm0
; SSE-NEXT: cvttps2dq %xmm0, %xmm0		; SSE-NEXT: cvttps2dq %xmm0, %xmm4
; SSE-NEXT: movaps {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]		; SSE-NEXT: movdqa %xmm3, %xmm0
; SSE-NEXT: xorps %xmm5, %xmm0		; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: andps %xmm2, %xmm3		; SSE-NEXT: pand %xmm4, %xmm0
; SSE-NEXT: andnps %xmm0, %xmm2		; SSE-NEXT: por %xmm3, %xmm0
; SSE-NEXT: orps %xmm3, %xmm2		; SSE-NEXT: cvttps2dq %xmm1, %xmm3
; SSE-NEXT: movaps %xmm1, %xmm3		; SSE-NEXT: subps %xmm2, %xmm1
; SSE-NEXT: cmpltps %xmm4, %xmm3		; SSE-NEXT: cvttps2dq %xmm1, %xmm2
; SSE-NEXT: cvttps2dq %xmm1, %xmm0		; SSE-NEXT: movdqa %xmm3, %xmm1
; SSE-NEXT: subps %xmm4, %xmm1		; SSE-NEXT: psrad $31, %xmm1
; SSE-NEXT: cvttps2dq %xmm1, %xmm1		; SSE-NEXT: pand %xmm2, %xmm1
; SSE-NEXT: xorps %xmm5, %xmm1		; SSE-NEXT: por %xmm3, %xmm1
; SSE-NEXT: andps %xmm3, %xmm0
; SSE-NEXT: andnps %xmm1, %xmm3
; SSE-NEXT: orps %xmm0, %xmm3
; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: movaps %xmm3, %xmm1
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_8f32_to_8i32:		; AVX1-LABEL: fptoui_8f32_to_8i32:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vmovaps {{.*#+}} ymm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX1-NEXT: vcvttps2dq %ymm0, %ymm1
; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm2		; AVX1-NEXT: vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
; AVX1-NEXT: vsubps %ymm1, %ymm0, %ymm1
; AVX1-NEXT: vcvttps2dq %ymm1, %ymm1
; AVX1-NEXT: vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
; AVX1-NEXT: vcvttps2dq %ymm0, %ymm0		; AVX1-NEXT: vcvttps2dq %ymm0, %ymm0
; AVX1-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm0		; AVX1-NEXT: vorps %ymm0, %ymm1, %ymm0
		; AVX1-NEXT: vblendvps %ymm1, %ymm0, %ymm1, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_8f32_to_8i32:		; AVX2-LABEL: fptoui_8f32_to_8i32:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vbroadcastss {{.*#+}} ymm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]		; AVX2-NEXT: vbroadcastss {{.*#+}} ymm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm2
; AVX2-NEXT: vsubps %ymm1, %ymm0, %ymm1		; AVX2-NEXT: vsubps %ymm1, %ymm0, %ymm1
; AVX2-NEXT: vcvttps2dq %ymm1, %ymm1		; AVX2-NEXT: vcvttps2dq %ymm1, %ymm1
; AVX2-NEXT: vbroadcastss {{.*#+}} ymm3 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
; AVX2-NEXT: vxorps %ymm3, %ymm1, %ymm1
; AVX2-NEXT: vcvttps2dq %ymm0, %ymm0		; AVX2-NEXT: vcvttps2dq %ymm0, %ymm0
; AVX2-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm0		; AVX2-NEXT: vpsrad $31, %ymm0, %ymm2
		; AVX2-NEXT: vpand %ymm2, %ymm1, %ymm1
		; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_8f32_to_8i32:		; AVX512F-LABEL: fptoui_8f32_to_8i32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0		; AVX512F-NEXT: vcvttps2udq %zmm0, %zmm0
; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0		; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
Show All 19 Lines
}		}

define <4 x i64> @fptoui_4f32_to_4i64(<8 x float> %a) {		define <4 x i64> @fptoui_4f32_to_4i64(<8 x float> %a) {
; SSE-LABEL: fptoui_4f32_to_4i64:		; SSE-LABEL: fptoui_4f32_to_4i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: subss %xmm1, %xmm2		; SSE-NEXT: subss %xmm1, %xmm2
; SSE-NEXT: cvttss2si %xmm2, %rcx		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[1,1]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[1,1]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm3
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm3, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: andq %rcx, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rax, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,3],xmm0[3,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,3],xmm0[3,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm3
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm3, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: andq %rcx, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rax, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
; SSE-NEXT: movaps %xmm0, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: subss %xmm1, %xmm0
; SSE-NEXT: cmovaeq %rcx, %rax		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %rdx
		; SSE-NEXT: sarq $63, %rdx
		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; SSE-NEXT: movdqa %xmm2, %xmm0		; SSE-NEXT: movdqa %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_4f32_to_4i64:		; AVX1-LABEL: fptoui_4f32_to_4i64:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]		; AVX1-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]
; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX1-NEXT: vsubss %xmm1, %xmm2, %xmm3		; AVX1-NEXT: vsubss %xmm1, %xmm2, %xmm3
; AVX1-NEXT: vcvttss2si %xmm3, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rax
; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX1-NEXT: vcvttss2si %xmm2, %rcx
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vcvttss2si %xmm2, %rdx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm2		; AVX1-NEXT: andq %rax, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm2		; AVX1-NEXT: vmovq %rdx, %xmm2
; AVX1-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]		; AVX1-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]
; AVX1-NEXT: vsubss %xmm1, %xmm3, %xmm4		; AVX1-NEXT: vsubss %xmm1, %xmm3, %xmm4
; AVX1-NEXT: vcvttss2si %xmm4, %rax		; AVX1-NEXT: vcvttss2si %xmm4, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rcx
; AVX1-NEXT: vcvttss2si %xmm3, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm3		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm3		; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm3
; AVX1-NEXT: vcvttss2si %xmm3, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttss2si %xmm0, %rcx
; AVX1-NEXT: vcvttss2si %xmm0, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm0		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; AVX1-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm4		; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm1
; AVX1-NEXT: vcvttss2si %xmm4, %rax		; AVX1-NEXT: vcvttss2si %xmm1, %rax
; AVX1-NEXT: xorq %rcx, %rax
; AVX1-NEXT: vcvttss2si %xmm0, %rcx		; AVX1-NEXT: vcvttss2si %xmm0, %rcx
; AVX1-NEXT: vucomiss %xmm1, %xmm0		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: cmovaeq %rax, %rcx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vmovq %rcx, %xmm0		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
		; AVX1-NEXT: vmovq %rdx, %xmm0
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_4f32_to_4i64:		; AVX2-LABEL: fptoui_4f32_to_4i64:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]		; AVX2-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]
; AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX2-NEXT: vsubss %xmm1, %xmm2, %xmm3		; AVX2-NEXT: vsubss %xmm1, %xmm2, %xmm3
; AVX2-NEXT: vcvttss2si %xmm3, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rax
; AVX2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX2-NEXT: vcvttss2si %xmm2, %rcx
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vcvttss2si %xmm2, %rdx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm2		; AVX2-NEXT: andq %rax, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm2		; AVX2-NEXT: vmovq %rdx, %xmm2
; AVX2-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]		; AVX2-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]
; AVX2-NEXT: vsubss %xmm1, %xmm3, %xmm4		; AVX2-NEXT: vsubss %xmm1, %xmm3, %xmm4
; AVX2-NEXT: vcvttss2si %xmm4, %rax		; AVX2-NEXT: vcvttss2si %xmm4, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rcx
; AVX2-NEXT: vcvttss2si %xmm3, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm3		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm3		; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm3
; AVX2-NEXT: vcvttss2si %xmm3, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttss2si %xmm0, %rcx
; AVX2-NEXT: vcvttss2si %xmm0, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm0		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; AVX2-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm4		; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm1
; AVX2-NEXT: vcvttss2si %xmm4, %rax		; AVX2-NEXT: vcvttss2si %xmm1, %rax
; AVX2-NEXT: xorq %rcx, %rax
; AVX2-NEXT: vcvttss2si %xmm0, %rcx		; AVX2-NEXT: vcvttss2si %xmm0, %rcx
; AVX2-NEXT: vucomiss %xmm1, %xmm0		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: cmovaeq %rax, %rcx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vmovq %rcx, %xmm0		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
		; AVX2-NEXT: vmovq %rdx, %xmm0
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0		; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_4f32_to_4i64:		; AVX512F-LABEL: fptoui_4f32_to_4i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[3,3,3,3]		; AVX512F-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[3,3,3,3]
; AVX512F-NEXT: vcvttss2usi %xmm1, %rax		; AVX512F-NEXT: vcvttss2usi %xmm1, %rax
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
}		}

define <4 x i64> @fptoui_8f32_to_4i64(<8 x float> %a) {		define <4 x i64> @fptoui_8f32_to_4i64(<8 x float> %a) {
; SSE-LABEL: fptoui_8f32_to_4i64:		; SSE-LABEL: fptoui_8f32_to_4i64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: subss %xmm1, %xmm2		; SSE-NEXT: subss %xmm1, %xmm2
; SSE-NEXT: cvttss2si %xmm2, %rcx		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[1,1]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[1,1]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm3
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm3, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: andq %rcx, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rax, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,3],xmm0[3,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,3],xmm0[3,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm3
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm3, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: andq %rcx, %rdx
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: orq %rax, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
; SSE-NEXT: movaps %xmm0, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: subss %xmm1, %xmm0
; SSE-NEXT: cmovaeq %rcx, %rax		; SSE-NEXT: cvttss2si %xmm0, %rcx
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %rdx
		; SSE-NEXT: sarq $63, %rdx
		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; SSE-NEXT: movdqa %xmm2, %xmm0		; SSE-NEXT: movdqa %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: fptoui_8f32_to_4i64:		; AVX1-LABEL: fptoui_8f32_to_4i64:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]		; AVX1-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]
; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX1-NEXT: vsubss %xmm1, %xmm2, %xmm3		; AVX1-NEXT: vsubss %xmm1, %xmm2, %xmm3
; AVX1-NEXT: vcvttss2si %xmm3, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rax
; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX1-NEXT: vcvttss2si %xmm2, %rcx
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vcvttss2si %xmm2, %rdx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm2		; AVX1-NEXT: andq %rax, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm2		; AVX1-NEXT: vmovq %rdx, %xmm2
; AVX1-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]		; AVX1-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]
; AVX1-NEXT: vsubss %xmm1, %xmm3, %xmm4		; AVX1-NEXT: vsubss %xmm1, %xmm3, %xmm4
; AVX1-NEXT: vcvttss2si %xmm4, %rax		; AVX1-NEXT: vcvttss2si %xmm4, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rcx
; AVX1-NEXT: vcvttss2si %xmm3, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm3		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm3		; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm3
; AVX1-NEXT: vcvttss2si %xmm3, %rax		; AVX1-NEXT: vcvttss2si %xmm3, %rax
; AVX1-NEXT: xorq %rcx, %rax		; AVX1-NEXT: vcvttss2si %xmm0, %rcx
; AVX1-NEXT: vcvttss2si %xmm0, %rdx		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: vucomiss %xmm1, %xmm0		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: cmovaeq %rax, %rdx		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
; AVX1-NEXT: vmovq %rdx, %xmm3		; AVX1-NEXT: vmovq %rdx, %xmm3
; AVX1-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; AVX1-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm4		; AVX1-NEXT: vsubss %xmm1, %xmm0, %xmm1
; AVX1-NEXT: vcvttss2si %xmm4, %rax		; AVX1-NEXT: vcvttss2si %xmm1, %rax
; AVX1-NEXT: xorq %rcx, %rax
; AVX1-NEXT: vcvttss2si %xmm0, %rcx		; AVX1-NEXT: vcvttss2si %xmm0, %rcx
; AVX1-NEXT: vucomiss %xmm1, %xmm0		; AVX1-NEXT: movq %rcx, %rdx
; AVX1-NEXT: cmovaeq %rax, %rcx		; AVX1-NEXT: sarq $63, %rdx
; AVX1-NEXT: vmovq %rcx, %xmm0		; AVX1-NEXT: andq %rax, %rdx
		; AVX1-NEXT: orq %rcx, %rdx
		; AVX1-NEXT: vmovq %rdx, %xmm0
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: fptoui_8f32_to_4i64:		; AVX2-LABEL: fptoui_8f32_to_4i64:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
; AVX2-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]		; AVX2-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[3,3,3,3]
; AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX2-NEXT: vsubss %xmm1, %xmm2, %xmm3		; AVX2-NEXT: vsubss %xmm1, %xmm2, %xmm3
; AVX2-NEXT: vcvttss2si %xmm3, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rax
; AVX2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; AVX2-NEXT: vcvttss2si %xmm2, %rcx
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vcvttss2si %xmm2, %rdx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm2		; AVX2-NEXT: andq %rax, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm2		; AVX2-NEXT: vmovq %rdx, %xmm2
; AVX2-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]		; AVX2-NEXT: vpermilpd {{.*#+}} xmm3 = xmm0[1,0]
; AVX2-NEXT: vsubss %xmm1, %xmm3, %xmm4		; AVX2-NEXT: vsubss %xmm1, %xmm3, %xmm4
; AVX2-NEXT: vcvttss2si %xmm4, %rax		; AVX2-NEXT: vcvttss2si %xmm4, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rcx
; AVX2-NEXT: vcvttss2si %xmm3, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm3		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm3		; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm3
; AVX2-NEXT: vcvttss2si %xmm3, %rax		; AVX2-NEXT: vcvttss2si %xmm3, %rax
; AVX2-NEXT: xorq %rcx, %rax		; AVX2-NEXT: vcvttss2si %xmm0, %rcx
; AVX2-NEXT: vcvttss2si %xmm0, %rdx		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: vucomiss %xmm1, %xmm0		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: cmovaeq %rax, %rdx		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
; AVX2-NEXT: vmovq %rdx, %xmm3		; AVX2-NEXT: vmovq %rdx, %xmm3
; AVX2-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; AVX2-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm4		; AVX2-NEXT: vsubss %xmm1, %xmm0, %xmm1
; AVX2-NEXT: vcvttss2si %xmm4, %rax		; AVX2-NEXT: vcvttss2si %xmm1, %rax
; AVX2-NEXT: xorq %rcx, %rax
; AVX2-NEXT: vcvttss2si %xmm0, %rcx		; AVX2-NEXT: vcvttss2si %xmm0, %rcx
; AVX2-NEXT: vucomiss %xmm1, %xmm0		; AVX2-NEXT: movq %rcx, %rdx
; AVX2-NEXT: cmovaeq %rax, %rcx		; AVX2-NEXT: sarq $63, %rdx
; AVX2-NEXT: vmovq %rcx, %xmm0		; AVX2-NEXT: andq %rax, %rdx
		; AVX2-NEXT: orq %rcx, %rdx
		; AVX2-NEXT: vmovq %rdx, %xmm0
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]		; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0		; AVX2-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_8f32_to_4i64:		; AVX512F-LABEL: fptoui_8f32_to_4i64:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]		; AVX512F-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; AVX512F-NEXT: vcvttss2usi %xmm1, %rax		; AVX512F-NEXT: vcvttss2usi %xmm1, %rax
▲ Show 20 Lines • Show All 909 Lines • ▼ Show 20 Lines
define <2 x i64> @fptoui_2f32_to_2i64_load(<2 x float>* %x) {		define <2 x i64> @fptoui_2f32_to_2i64_load(<2 x float>* %x) {
; SSE-LABEL: fptoui_2f32_to_2i64_load:		; SSE-LABEL: fptoui_2f32_to_2i64_load:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; SSE-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE-NEXT: movaps %xmm1, %xmm0		; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: subss %xmm2, %xmm0		; SSE-NEXT: subss %xmm2, %xmm0
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; SSE-NEXT: cvttss2si %xmm1, %rcx
; SSE-NEXT: xorq %rcx, %rax		; SSE-NEXT: movq %rcx, %rdx
; SSE-NEXT: cvttss2si %xmm1, %rdx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: ucomiss %xmm2, %xmm1		; SSE-NEXT: andq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rdx		; SSE-NEXT: orq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm0		; SSE-NEXT: movq %rdx, %xmm0
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]
; SSE-NEXT: movaps %xmm1, %xmm3		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: subss %xmm2, %xmm3		; SSE-NEXT: subss %xmm2, %xmm1
; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: xorq %rcx, %rax
; SSE-NEXT: cvttss2si %xmm1, %rcx		; SSE-NEXT: cvttss2si %xmm1, %rcx
; SSE-NEXT: ucomiss %xmm2, %xmm1		; SSE-NEXT: movq %rax, %rdx
; SSE-NEXT: cmovaeq %rax, %rcx		; SSE-NEXT: sarq $63, %rdx
; SSE-NEXT: movq %rcx, %xmm1		; SSE-NEXT: andq %rcx, %rdx
		; SSE-NEXT: orq %rax, %rdx
		; SSE-NEXT: movq %rdx, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: fptoui_2f32_to_2i64_load:		; VEX-LABEL: fptoui_2f32_to_2i64_load:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; VEX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; VEX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; VEX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm2		; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm2
; VEX-NEXT: vcvttss2si %xmm2, %rax		; VEX-NEXT: vcvttss2si %xmm2, %rax
; VEX-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000		; VEX-NEXT: vcvttss2si %xmm0, %rcx
; VEX-NEXT: xorq %rcx, %rax		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: vcvttss2si %xmm0, %rdx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vucomiss %xmm1, %xmm0		; VEX-NEXT: andq %rax, %rdx
; VEX-NEXT: cmovaeq %rax, %rdx		; VEX-NEXT: orq %rcx, %rdx
; VEX-NEXT: vmovq %rdx, %xmm2		; VEX-NEXT: vmovq %rdx, %xmm2
; VEX-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]		; VEX-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm3		; VEX-NEXT: vsubss %xmm1, %xmm0, %xmm1
; VEX-NEXT: vcvttss2si %xmm3, %rax		; VEX-NEXT: vcvttss2si %xmm1, %rax
; VEX-NEXT: xorq %rcx, %rax
; VEX-NEXT: vcvttss2si %xmm0, %rcx		; VEX-NEXT: vcvttss2si %xmm0, %rcx
; VEX-NEXT: vucomiss %xmm1, %xmm0		; VEX-NEXT: movq %rcx, %rdx
; VEX-NEXT: cmovaeq %rax, %rcx		; VEX-NEXT: sarq $63, %rdx
; VEX-NEXT: vmovq %rcx, %xmm0		; VEX-NEXT: andq %rax, %rdx
		; VEX-NEXT: orq %rcx, %rdx
		; VEX-NEXT: vmovq %rdx, %xmm0
; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]		; VEX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512F-LABEL: fptoui_2f32_to_2i64_load:		; AVX512F-LABEL: fptoui_2f32_to_2i64_load:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; AVX512F-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX512F-NEXT: vcvttss2usi %xmm0, %rax		; AVX512F-NEXT: vcvttss2usi %xmm0, %rax
; AVX512F-NEXT: vmovq %rax, %xmm1		; AVX512F-NEXT: vmovq %rax, %xmm1
Show All 33 Lines

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ,AVX1			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=bdver1 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ,XOP			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=bdver1 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ,AVX2			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256NODQ
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skylake-avx512 -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX512F			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skylake-avx512 -mattr=-prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512F
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skylake-avx512 -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512,AVX256DQ			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skylake-avx512 -mattr=+prefer-256-bit -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX,AVX256DQ

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	@src64 = common global [8 x double] zeroinitializer, align 64			@src64 = common global [8 x double] zeroinitializer, align 64
	@src32 = common global [16 x float] zeroinitializer, align 64			@src32 = common global [16 x float] zeroinitializer, align 64
	@dst64 = common global [8 x i64] zeroinitializer, align 64			@dst64 = common global [8 x i64] zeroinitializer, align 64
	@dst32 = common global [16 x i32] zeroinitializer, align 64			@dst32 = common global [16 x i32] zeroinitializer, align 64
	@dst16 = common global [32 x i16] zeroinitializer, align 64			@dst16 = common global [32 x i16] zeroinitializer, align 64
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8			; SSE-NEXT: [[TMP1:%.]] = load <4 x double>, <4 x double> bitcast ([8 x double]* @src64 to <4 x double>*), align 8
	; SSE-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8			; SSE-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> bitcast (double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4) to <4 x double>*), align 8
	; SSE-NEXT: [[TMP3:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i32>			; SSE-NEXT: [[TMP3:%.*]] = fptoui <4 x double> [[TMP1]] to <4 x i32>
	; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP2]] to <4 x i32>			; SSE-NEXT: [[TMP4:%.*]] = fptoui <4 x double> [[TMP2]] to <4 x i32>
	; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4			; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([16 x i32]* @dst32 to <4 x i32>*), align 4
	; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4			; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 4
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX1-LABEL: @fptoui_8f64_8i32(			; AVX-LABEL: @fptoui_8f64_8i32(
	; AVX1-NEXT: [[A0:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8			; AVX-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX1-NEXT: [[A1:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8			; AVX-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>
	; AVX1-NEXT: [[A2:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8			; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
	; AVX1-NEXT: [[A3:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8			; AVX-NEXT: ret void
	; AVX1-NEXT: [[A4:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
	; AVX1-NEXT: [[A5:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
	; AVX1-NEXT: [[A6:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
	; AVX1-NEXT: [[A7:%.]] = load double, double getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
	; AVX1-NEXT: [[CVT0:%.*]] = fptoui double [[A0]] to i32
	; AVX1-NEXT: [[CVT1:%.*]] = fptoui double [[A1]] to i32
	; AVX1-NEXT: [[CVT2:%.*]] = fptoui double [[A2]] to i32
	; AVX1-NEXT: [[CVT3:%.*]] = fptoui double [[A3]] to i32
	; AVX1-NEXT: [[CVT4:%.*]] = fptoui double [[A4]] to i32
	; AVX1-NEXT: [[CVT5:%.*]] = fptoui double [[A5]] to i32
	; AVX1-NEXT: [[CVT6:%.*]] = fptoui double [[A6]] to i32
	; AVX1-NEXT: [[CVT7:%.*]] = fptoui double [[A7]] to i32
	; AVX1-NEXT: store i32 [[CVT0]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 0), align 4
	; AVX1-NEXT: store i32 [[CVT1]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 1), align 4
	; AVX1-NEXT: store i32 [[CVT2]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 2), align 4
	; AVX1-NEXT: store i32 [[CVT3]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 3), align 4
	; AVX1-NEXT: store i32 [[CVT4]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 4), align 4
	; AVX1-NEXT: store i32 [[CVT5]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 5), align 4
	; AVX1-NEXT: store i32 [[CVT6]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 6), align 4
	; AVX1-NEXT: store i32 [[CVT7]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @dst32, i32 0, i64 7), align 4
	; AVX1-NEXT: ret void
	;
	; XOP-LABEL: @fptoui_8f64_8i32(
	; XOP-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; XOP-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>
	; XOP-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
	; XOP-NEXT: ret void
	;
	; AVX2-LABEL: @fptoui_8f64_8i32(
	; AVX2-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX2-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>
	; AVX2-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
	; AVX2-NEXT: ret void
	;
	; AVX512-LABEL: @fptoui_8f64_8i32(
	; AVX512-NEXT: [[TMP1:%.]] = load <8 x double>, <8 x double> bitcast ([8 x double]* @src64 to <8 x double>*), align 8
	; AVX512-NEXT: [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i32>
	; AVX512-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([16 x i32]* @dst32 to <8 x i32>*), align 4
	; AVX512-NEXT: ret void
	;			;
	%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8			%a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
	%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8			%a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
	%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8			%a2 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
	%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8			%a3 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
	%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8			%a4 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
	%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8			%a5 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
	%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8			%a6 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
	▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Implement smarter instruction lowering for FP_TO_UINT from vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 358554

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/X86/fptoui.ll

llvm/test/CodeGen/X86/concat-cast.ll

llvm/test/CodeGen/X86/fptoui-sat-scalar.ll

llvm/test/CodeGen/X86/ftrunc.ll

llvm/test/CodeGen/X86/half.ll

llvm/test/CodeGen/X86/scalar-fp-to-i32.ll

llvm/test/CodeGen/X86/scalar-fp-to-i64.ll

llvm/test/CodeGen/X86/vec_cast3.ll

llvm/test/CodeGen/X86/vec_fp_to_int.ll

llvm/test/Transforms/SLPVectorizer/X86/fptoui.ll

[X86] Implement smarter instruction lowering for FP_TO_UINT from vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
ClosedPublic