This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
ISDOpcodes.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
2
ftrunc.ll
-
PowerPC/
-
fp-int128-fp-combine.ll
-
fp-to-int-to-fp.ll
-
no-extra-fp-conv-ldst.ll
-
X86/
-
2011-10-19-widen_vselect.ll
-
ftrunc.ll

Differential D44909

[DAGCombine] (float)((int) f) --> ftrunc (PR36617)
ClosedPublic

Authored by spatel on Mar 26 2018, 2:37 PM.

Download Raw Diff

Details

Reviewers

scanon
efriedma
craig.topper
RKSimon
nemanjai
hfinkel
javed.absar

Commits

rG3d453ad7118a: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
rG6124cae8f765: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
rL330437: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
rL328921: [DAGCombine] (float)((int) f) --> ftrunc (PR36617)

Summary

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are UB.

Diff Detail

Event Timeline

spatel created this revision.Mar 26 2018, 2:37 PM

Herald added subscribers: nhaehnle, mcrosier. · View Herald TranscriptMar 26 2018, 2:37 PM

This is the class of optimizations that I would call "formally allowed by the standard, but extremely likely to break things and surprise people." Which isn't to say that we shouldn't do it, just ... be prepared.

More of a concern (to me), this will be a significant perf regression for armv7 and non-SSE4.1 x86, and most other arches with hardware conversion but not trunc. We need so means to identify those targets and disable this (the double-conversion dance is frequently used specifically as an optimization to avoid calling trunc on such targets).

In D44909#1048692, @scanon wrote:

This is the class of optimizations that I would call "formally allowed by the standard, but extremely likely to break things and surprise people." Which isn't to say that we shouldn't do it, just ... be prepared.

More of a concern (to me), this will be a significant perf regression for armv7 and non-SSE4.1 x86, and most other arches with hardware conversion but not trunc. We need so means to identify those targets and disable this (the double-conversion dance is frequently used specifically as an optimization to avoid calling trunc on such targets).

Ah, I wasn't thinking hard enough about targets/types that don't have direct support. We really don't want to do this in IR then. Let me look at a DAG combine instead.

Patch updated:
Move the transform to the DAG and only do it when the target says it is legal. This is a completely different patch than the last rev, but keeping it under the same review number for context.

Two questions, to which I do not know the answer:

(a) Are the semantics of ISD::FP_TO_[US]INT for out-of-range values specified anywhere? The do not seem to be, but maybe I'm just missing it.
(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?

In D44909#1048805, @scanon wrote:

Two questions, to which I do not know the answer:

(a) Are the semantics of ISD::FP_TO_[US]INT for out-of-range values specified anywhere? The do not seem to be, but maybe I'm just missing it.

No, I don't see anything either. We just have:

/// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned
/// integer.

I was assuming these nodes follow the same rules as the IR instructions since they are mapped 1-to-1 in SelectionDAGBuilder::visitFPToSI() etc.

(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?

I don't see coverage for armv7, but I may be overlooking it (cc @dmgreen). For x86, I also don't see any test of the roundtrip. I'll add a RUN line to the ftrunc.ll test file.

In D44909#1048811, @spatel wrote:
In D44909#1048805, @scanon wrote:

Two questions, to which I do not know the answer:

(a) Are the semantics of ISD::FP_TO_[US]INT for out-of-range values specified anywhere? The do not seem to be, but maybe I'm just missing it.

No, I don't see anything either. We just have:
/// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned
/// integer.
I was assuming these nodes follow the same rules as the IR instructions since they are mapped 1-to-1 in SelectionDAGBuilder::visitFPToSI() etc.

I'll buy that. Can you add a note specifying this?

(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?

I don't see coverage for armv7, but I may be overlooking it (cc @dmgreen). For x86, I also don't see any test of the roundtrip. I'll add a RUN line to the ftrunc.ll test file.

Great. With those two questions addressed, this LGTM.

In D44909#1048805, @scanon wrote:

Two questions, to which I do not know the answer:

(a) Are the semantics of ISD::FP_TO_[US]INT for out-of-range values specified anywhere? The do not seem to be, but maybe I'm just missing it.
(b) Are there test cases that make sure this transform is *not* applied for armv7 and x86 -sse4.1?

I think this is important to answer. I can see that on PPC, we use saturating FP -> Int conversions (i.e. values larger than 2^n -1 produce the maximum integral value, etc.). That behaviour will certainly change with this combine.

Patch updated:

Add a comment to the DAG node definitions to indicate they have the same behavior as their IR counterparts.
Add a basic test file for ARMv7 (let me know if that needs adjusting or more coverage).
Added a RUN line for x86 pre-SSE4.1 to ftrunc.ll to show that this patch doesn't change anything for that target.

Herald added a subscriber: javed.absar. · View Herald TranscriptMar 26 2018, 5:53 PM

Ah, OK. The language reference clearly states that unrepresentable values produce undefined results. As far as PPC is concerned, this LGTM.

P.S. If you think it's appropriate, perhaps it would be nice to add this PPC test case since PPC has FTRUNC for vectors as well:

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown \
; RUN:   -verify-machineinstrs < %s | FileCheck %s

define <4 x float> @truncf32(<4 x float> %a) {
; CHECK-LABEL: truncf32:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrspiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptosi <4 x float> %a to <4 x i32>
  %1 = sitofp <4 x i32> %0 to <4 x float>
  ret <4 x float> %1
}

define <2 x double> @truncf64(<2 x double> %a) {
; CHECK-LABEL: truncf64:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrdpiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptosi <2 x double> %a to <2 x i64>
  %1 = sitofp <2 x i64> %0 to <2 x double>
  ret <2 x double> %1
}

define <4 x float> @truncf32u(<4 x float> %a) {
; CHECK-LABEL: truncf32u:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrspiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptoui <4 x float> %a to <4 x i32>
  %1 = uitofp <4 x i32> %0 to <4 x float>
  ret <4 x float> %1
}

define <2 x double> @truncf64u(<2 x double> %a) {
; CHECK-LABEL: truncf64u:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    xvrdpiz 34, 34
; CHECK-NEXT:    blr
entry:
  %0 = fptoui <2 x double> %a to <2 x i64>
  %1 = uitofp <2 x i64> %0 to <2 x double>
  ret <2 x double> %1
}

This revision is now accepted and ready to land.Mar 26 2018, 6:26 PM

nhaehnle removed a subscriber: nhaehnle.Mar 27 2018, 1:06 AM

Thanks for adding the new test. LGTM

test/CodeGen/ARM/ftrunc.ll
2	Mind adding a armv8 target too? It should be come up with something like a vrintz, if I'm understanding this correctly.
25	This is fine as-is, but I think there are no double to i64 rounding in arm (there is aarch64), hence the aeabi calls. There will be a double to i32 and back, but this is definitely testing something sensible, and the armv8 target should show a big improvement :)

spatel mentioned this in rL328682: [PowerPC] add ftrunc vector tests; NFC.Mar 27 2018, 5:52 PM

spatel mentioned this in rL328683: [AArch64] add ftrunc tests; NFC.Mar 27 2018, 5:58 PM

Closed by commit rL328921: [DAGCombine] (float)((int) f) --> ftrunc (PR36617) (authored by spatel). · Explain WhyMar 31 2018, 10:59 AM

This revision was automatically updated to reflect the committed changes.

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

In D44909#1065089, @thakis wrote:

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

The patch started off as an IR transform, and the quoted comment applied only to cases (I hope) where the target doesn’t have native support for an fptrunc instruction. NaN/inf inputs are undefined with these ops, so if something was relying on that undefined output, that could be the source of the problem.

I’m not an arm64 expert, but the tests show that we should codegen ‘frintz’ instructions with this patch. So that would be the place to look for diffs.

In D44909#1065109, @spatel wrote:

In D44909#1065089, @thakis wrote:

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

The patch started off as an IR transform, and the quoted comment applied only to cases (I hope) where the target doesn’t have native support for an fptrunc instruction. NaN/inf inputs are undefined with these ops, so if something was relying on that undefined output, that could be the source of the problem.

I’m not an arm64 expert, but the tests show that we should codegen ‘frintz’ instructions with this patch. So that would be the place to look for diffs.

We found it: https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c48

It's exactly the "formally allowed by the standard, but extremely likely to break things and surprise people" situation :-) The question is just what the appropriate fix is...

In D44909#1065600, @hans wrote:

In D44909#1065109, @spatel wrote:

In D44909#1065089, @thakis wrote:

This broke some large integration test on arm64 in Chromium (https://crbug.com/831145 – doesn't yet have any information other than "something is broken" and that we bisected it to this revision). It sounds like the patch got rewritten during review; does "formally allowed by the standard, but extremely likely to break things and surprise people" still apply? It sounds like this is supposed to be behavior-preserving for values that aren't NaN and Inf, but it does change behavior for those two? (If so, in what way?)

The patch started off as an IR transform, and the quoted comment applied only to cases (I hope) where the target doesn’t have native support for an fptrunc instruction. NaN/inf inputs are undefined with these ops, so if something was relying on that undefined output, that could be the source of the problem.

I’m not an arm64 expert, but the tests show that we should codegen ‘frintz’ instructions with this patch. So that would be the place to look for diffs.

We found it: https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c48

It's exactly the "formally allowed by the standard, but extremely likely to break things and surprise people" situation :-) The question is just what the appropriate fix is...

Aha - nice detective work. :)
If it helps, I can revert this while fixing the sources. Can this pattern be added to the UB sanitizer? Or maybe it's already there, but wasn't used?

echristo added a subscriber: echristo.Apr 12 2018, 7:53 AM

We think there's an UB check for this already, but v8 isn't UB clean.

The fast way to do the "is this double an int" check on an assembly level is to convert to int and back and then compare if the roundtripped value is equal. After this change, is there some way to express this approach in C?

spatel mentioned this in rL329920: revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617).Apr 12 2018, 8:30 AM

Reopening - reverted at rL329920.

This revision is now accepted and ready to land.Apr 12 2018, 8:34 AM

In D44909#1065673, @thakis wrote:

We think there's an UB check for this already, but v8 isn't UB clean.

The fast way to do the "is this double an int" check on an assembly level is to convert to int and back and then compare if the roundtripped value is equal. After this change, is there some way to express this approach in C?

Still trying to digest this:
https://wiki.sei.cmu.edu/confluence/display/c/FLP34-C.+Ensure+that+floating-point+conversions+are+within+range+of+the+new+type

After this change, is there some way to express this approach in C?

You can use platform-specific intrinsics, e.g. _mm_cvtsd_si32 is defined to return INT_MIN for out-of-range values. Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).

In D44909#1065786, @efriedma wrote:

After this change, is there some way to express this approach in C?

You can use platform-specific intrinsics, e.g. _mm_cvtsd_si32 is defined to return INT_MIN for out-of-range values. Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).

C++ (templated) option is shown as an example here:
https://stackoverflow.com/questions/2544394/c-floating-point-to-integer-type-conversions

Or, in portable C, you can do a range check: if (x <= INT_MAX && x >= INT_MIN && (double)(int)x == x).

Sorry, the bounds on this aren't precisely correct; you want something more like if (x < INT_MAX+1LL && x >= INT_MIN-1LL && (double)(int)x == x). And note this depends on the fact that INT_MIN-1LL can be exactly represented as a double.

I see that patches were applied to the Chromium sources:
https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c59

Is it ok to re-commit this and see what else breaks? :)

In D44909#1072197, @spatel wrote:

I see that patches were applied to the Chromium sources:
https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c59

Is it ok to re-commit this and see what else breaks? :)

Sounds okay to me. It would probably be good to add a release notes entry about this though, explaining what can break and showing how to use UBSan to check for it. I can easily imagine other code relying on the same pattern that V8 used.

Closed by commit rL330437: [DAGCombine] (float)((int) f) --> ftrunc (PR36617) (authored by spatel). · Explain WhyApr 20 2018, 8:11 AM

This revision was automatically updated to reflect the committed changes.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 20 2018, 8:11 AM

What is the expected behavior for casting float values in range (-1.0 , 0.0) from float->int->float? If the expected value is -0.0 (instead of 0.0), what is the reason for that?

In D44909#1126945, @rdhindsa wrote:

What is the expected behavior for casting float values in range (-1.0 , 0.0) from float->int->float? If the expected value is -0.0 (instead of 0.0), what is the reason for that?

Not sure I understand the question. Casting FP (-1.0, 0.0) to int with round-to-zero mode results in integer 0. Converting that back to FP is always 0.0.

In D44909#1128334, @spatel wrote:

In D44909#1126945, @rdhindsa wrote:

What is the expected behavior for casting float values in range (-1.0 , 0.0) from float->int->float? If the expected value is -0.0 (instead of 0.0), what is the reason for that?

Not sure I understand the question. Casting FP (-1.0, 0.0) to int with round-to-zero mode results in integer 0. Converting that back to FP is always 0.0.

Ah, I see - we get this wrong because we may now produce -0.0 instead of 0.0.

Ie, trunc(), ISD::FTRUNC, and presumably all of the hardware instructions that map to those ops play by the IEEE754 rounding rules:
"These operations convert zero operands to zero results of the same sign."

So we can't do this without 'nsz', right?

(FWIW since this was enabled again we had at least two bugs due to this -- https://crbug.com/845816 https://crbug.com/851415 -- and one confused thread at https://groups.google.com/a/chromium.org/forum/#!topic/cxx/W5wma_HXWOo )

spatel mentioned this in D48085: [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros.Jun 12 2018, 10:39 AM

spatel mentioned this in rL335761: [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros.Jun 27 2018, 11:21 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ISDOpcodes.h

3 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

18 lines

test/

CodeGen/

ARM/

ftrunc.ll

29 lines

PowerPC/

fp-int128-fp-combine.ll

13 lines

fp-to-int-to-fp.ll

12 lines

no-extra-fp-conv-ldst.ll

24 lines

X86/

2011-10-19-widen_vselect.ll

6 lines

ftrunc.ll

292 lines

Diff 139871

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	enum NodeType {
/// result type must have fewer elements than the operand type, and those		/// result type must have fewer elements than the operand type, and those
/// elements must be larger integer types such that the total size of the		/// elements must be larger integer types such that the total size of the
/// operand type and the result type match. Each of the low operand		/// operand type and the result type match. Each of the low operand
/// elements is zero-extended into the corresponding, wider result		/// elements is zero-extended into the corresponding, wider result
/// elements.		/// elements.
ZERO_EXTEND_VECTOR_INREG,		ZERO_EXTEND_VECTOR_INREG,

/// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned		/// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned
/// integer.		/// integer. These have the same semantics as fptosi and fptoui in IR. If
		/// the FP value cannot fit in the integer type, the results are undefined.
FP_TO_SINT,		FP_TO_SINT,
FP_TO_UINT,		FP_TO_UINT,

/// X = FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating point type		/// X = FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating point type
/// down to the precision of the destination VT. TRUNC is a flag, which is		/// down to the precision of the destination VT. TRUNC is a flag, which is
/// always an integer that is zero or one. If TRUNC is 0, this is a		/// always an integer that is zero or one. If TRUNC is 0, this is a
/// normal rounding, if it is 1, this FP_ROUND is known to not change the		/// normal rounding, if it is 1, this FP_ROUND is known to not change the
/// value of Y.		/// value of Y.
▲ Show 20 Lines • Show All 491 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,739 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::ZERO_EXTEND &&
SDValue Ops[] =		SDValue Ops[] =
{ N0.getOperand(0).getOperand(0), N0.getOperand(0).getOperand(1),		{ N0.getOperand(0).getOperand(0), N0.getOperand(0).getOperand(1),
DAG.getConstantFP(1.0, DL, VT), DAG.getConstantFP(0.0, DL, VT),		DAG.getConstantFP(1.0, DL, VT), DAG.getConstantFP(0.0, DL, VT),
N0.getOperand(0).getOperand(2) };		N0.getOperand(0).getOperand(2) };
return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);		return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);
}		}
}		}

		// fptosi rounds towards zero, so converting from FP to integer and back is
		// the same as an 'ftrunc': sitofp (fptosi X) --> ftrunc X
		// We only do this if the target has legal ftrunc, otherwise we'd likely be
		// replacing casts with a libcall.
		if (N0.getOpcode() == ISD::FP_TO_SINT &&
		N0.getOperand(0).getValueType() == VT &&
		TLI.isOperationLegal(ISD::FTRUNC, VT))
		return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitUINT_TO_FP(SDNode *N) {		SDValue DAGCombiner::visitUINT_TO_FP(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT OpVT = N0.getValueType();		EVT OpVT = N0.getValueType();

Show All 23 Lines	if (N0.getOpcode() == ISD::SETCC && !VT.isVector() &&
SDValue Ops[] =		SDValue Ops[] =
{ N0.getOperand(0), N0.getOperand(1),		{ N0.getOperand(0), N0.getOperand(1),
DAG.getConstantFP(1.0, DL, VT), DAG.getConstantFP(0.0, DL, VT),		DAG.getConstantFP(1.0, DL, VT), DAG.getConstantFP(0.0, DL, VT),
N0.getOperand(2) };		N0.getOperand(2) };
return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);		return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);
}		}
}		}

		// fptoui rounds towards zero, so converting from FP to integer and back is
		// the same as an 'ftrunc': uitofp (fptoui X) --> ftrunc X
		// We only do this if the target has legal ftrunc, otherwise we'd likely be
		// replacing casts with a libcall.
		if (N0.getOpcode() == ISD::FP_TO_UINT &&
		N0.getOperand(0).getValueType() == VT &&
		TLI.isOperationLegal(ISD::FTRUNC, VT))
		return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));

return SDValue();		return SDValue();
}		}

// Fold (fp_to_{s/u}int ({s/u}int_to_fpx)) -> zext x, sext x, trunc x, or x		// Fold (fp_to_{s/u}int ({s/u}int_to_fpx)) -> zext x, sext x, trunc x, or x
static SDValue FoldIntToFPToInt(SDNode *N, SelectionDAG &DAG) {		static SDValue FoldIntToFPToInt(SDNode *N, SelectionDAG &DAG) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

▲ Show 20 Lines • Show All 6,923 Lines • Show Last 20 Lines

test/CodeGen/ARM/ftrunc.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=armv7-eabi < %s \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions Mind adding a armv8 target too? It should be come up with something like a vrintz, if I'm understanding this correctly. dmgreen: Mind adding a armv8 target too? It should be come up with something like a vrintz, if I'm…

				define float @trunc_unsigned_f32(float %x) nounwind {
				; CHECK-LABEL: trunc_unsigned_f32:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: vmov s0, r0
				; CHECK-NEXT: vcvt.u32.f32 s0, s0
				; CHECK-NEXT: vcvt.f32.u32 s0, s0
				; CHECK-NEXT: vmov r0, s0
				; CHECK-NEXT: bx lr
				%i = fptoui float %x to i32
				%r = uitofp i32 %i to float
				ret float %r
				}

				define double @trunc_unsigned_f64(double %x) nounwind {
				; CHECK-LABEL: trunc_unsigned_f64:
				; CHECK: @ %bb.0:
				; CHECK-NEXT: .save {r11, lr}
				; CHECK-NEXT: push {r11, lr}
				; CHECK-NEXT: bl __aeabi_d2ulz
				; CHECK-NEXT: bl __aeabi_ul2d
				; CHECK-NEXT: pop {r11, pc}
				%i = fptoui double %x to i64
				dmgreenUnsubmitted Not Done Reply Inline Actions This is fine as-is, but I think there are no double to i64 rounding in arm (there is aarch64), hence the aeabi calls. There will be a double to i32 and back, but this is definitely testing something sensible, and the armv8 target should show a big improvement :) dmgreen: This is fine as-is, but I think there are no double to i64 rounding in arm (there is aarch64)…
				%r = uitofp i64 %i to double
				ret double %r
				}

test/CodeGen/PowerPC/fp-int128-fp-combine.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s

	; xscvdpsxds should NOT be emitted, since it saturates the result down to i64.			; xscvdpsxds should NOT be emitted, since it saturates the result down to i64.
	define float @f_i128_f(float %v) {			define float @f_i128_f(float %v) {
	; CHECK-LABEL: f_i128_f:			; CHECK-LABEL: f_i128_f:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: mflr 0			; CHECK-NEXT: friz 1, 1
	; CHECK-NEXT: std 0, 16(1)
	; CHECK-NEXT: stdu 1, -32(1)
	; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: .cfi_offset lr, 16
	; CHECK-NEXT: bl __fixsfti
	; CHECK-NEXT: nop
	; CHECK-NEXT: bl __floattisf
	; CHECK-NEXT: nop
	; CHECK-NEXT: addi 1, 1, 32
	; CHECK-NEXT: ld 0, 16(1)
	; CHECK-NEXT: mtlr 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%a = fptosi float %v to i128			%a = fptosi float %v to i128
	%b = sitofp i128 %a to float			%b = sitofp i128 %a to float
	ret float %b			ret float %b
	}			}

test/CodeGen/PowerPC/fp-to-int-to-fp.ll

	; RUN: llc -verify-machineinstrs -mcpu=a2 < %s \| FileCheck %s -check-prefix=FPCVT			; RUN: llc -verify-machineinstrs -mcpu=a2 < %s \| FileCheck %s -check-prefix=FPCVT
	; RUN: llc -verify-machineinstrs -mcpu=ppc64 < %s \| FileCheck %s -check-prefix=PPC64			; RUN: llc -verify-machineinstrs -mcpu=ppc64 < %s \| FileCheck %s -check-prefix=PPC64
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define float @fool(float %X) #0 {			define float @fool(float %X) #0 {
	entry:			entry:
	%conv = fptosi float %X to i64			%conv = fptosi float %X to i64
	%conv1 = sitofp i64 %conv to float			%conv1 = sitofp i64 %conv to float
	ret float %conv1			ret float %conv1

	; FPCVT-LABEL: @fool			; FPCVT-LABEL: @fool
	; FPCVT: fctidz [[REG1:[0-9]+]], 1			; FPCVT: friz 1, 1
	; FPCVT: fcfids 1, [[REG1]]
	; FPCVT: blr			; FPCVT: blr

	; PPC64-LABEL: @fool			; PPC64-LABEL: @fool
	; PPC64: fctidz [[REG1:[0-9]+]], 1			; PPC64: fctidz [[REG1:[0-9]+]], 1
	; PPC64: fcfid [[REG2:[0-9]+]], [[REG1]]			; PPC64: fcfid [[REG2:[0-9]+]], [[REG1]]
	; PPC64: frsp 1, [[REG2]]			; PPC64: frsp 1, [[REG2]]
	; PPC64: blr			; PPC64: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define double @foodl(double %X) #0 {			define double @foodl(double %X) #0 {
	entry:			entry:
	%conv = fptosi double %X to i64			%conv = fptosi double %X to i64
	%conv1 = sitofp i64 %conv to double			%conv1 = sitofp i64 %conv to double
	ret double %conv1			ret double %conv1

	; FPCVT-LABEL: @foodl			; FPCVT-LABEL: @foodl
	; FPCVT: fctidz [[REG1:[0-9]+]], 1			; FPCVT: friz 1, 1
	; FPCVT: fcfid 1, [[REG1]]
	; FPCVT: blr			; FPCVT: blr

	; PPC64-LABEL: @foodl			; PPC64-LABEL: @foodl
	; PPC64: fctidz [[REG1:[0-9]+]], 1			; PPC64: fctidz [[REG1:[0-9]+]], 1
	; PPC64: fcfid 1, [[REG1]]			; PPC64: fcfid 1, [[REG1]]
	; PPC64: blr			; PPC64: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define float @fooul(float %X) #0 {			define float @fooul(float %X) #0 {
	entry:			entry:
	%conv = fptoui float %X to i64			%conv = fptoui float %X to i64
	%conv1 = uitofp i64 %conv to float			%conv1 = uitofp i64 %conv to float
	ret float %conv1			ret float %conv1

	; FPCVT-LABEL: @fooul			; FPCVT-LABEL: @fooul
	; FPCVT: fctiduz [[REG1:[0-9]+]], 1			; FPCVT: friz 1, 1
	; FPCVT: fcfidus 1, [[REG1]]
	; FPCVT: blr			; FPCVT: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define double @fooudl(double %X) #0 {			define double @fooudl(double %X) #0 {
	entry:			entry:
	%conv = fptoui double %X to i64			%conv = fptoui double %X to i64
	%conv1 = uitofp i64 %conv to double			%conv1 = uitofp i64 %conv to double
	ret double %conv1			ret double %conv1

	; FPCVT-LABEL: @fooudl			; FPCVT-LABEL: @fooudl
	; FPCVT: fctiduz [[REG1:[0-9]+]], 1			; FPCVT: friz 1, 1
	; FPCVT: fcfidu 1, [[REG1]]
	; FPCVT: blr			; FPCVT: blr
	}			}

	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }

test/CodeGen/PowerPC/no-extra-fp-conv-ldst.ll

	Show All 30 Lines
	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define float @foo(float %X) #0 {			define float @foo(float %X) #0 {
	entry:			entry:
	%conv = fptosi float %X to i32			%conv = fptosi float %X to i32
	%conv1 = sitofp i32 %conv to float			%conv1 = sitofp i32 %conv to float
	ret float %conv1			ret float %conv1

	; CHECK-LABEL: @foo			; CHECK-LABEL: @foo
	; CHECK-DAG: fctiwz [[REG2:[0-9]+]], 1			; CHECK: friz 1, 1
	; CHECK-DAG: addi [[REG1:[0-9]+]], 1,
	; CHECK: stfiwx [[REG2]], 0, [[REG1]]
	; CHECK: lfiwax [[REG3:[0-9]+]], 0, [[REG1]]
	; CHECK: fcfids 1, [[REG3]]
	; CHECK: blr			; CHECK: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define double @food(double %X) #0 {			define double @food(double %X) #0 {
	entry:			entry:
	%conv = fptosi double %X to i32			%conv = fptosi double %X to i32
	%conv1 = sitofp i32 %conv to double			%conv1 = sitofp i32 %conv to double
	ret double %conv1			ret double %conv1

	; CHECK-LABEL: @food			; CHECK-LABEL: @food
	; CHECK-DAG: fctiwz [[REG2:[0-9]+]], 1			; CHECK: friz 1, 1
	; CHECK-DAG: addi [[REG1:[0-9]+]], 1,
	; CHECK: stfiwx [[REG2]], 0, [[REG1]]
	; CHECK: lfiwax [[REG3:[0-9]+]], 0, [[REG1]]
	; CHECK: fcfid 1, [[REG3]]
	; CHECK: blr			; CHECK: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define float @foou(float %X) #0 {			define float @foou(float %X) #0 {
	entry:			entry:
	%conv = fptoui float %X to i32			%conv = fptoui float %X to i32
	%conv1 = uitofp i32 %conv to float			%conv1 = uitofp i32 %conv to float
	ret float %conv1			ret float %conv1

	; CHECK-LABEL: @foou			; CHECK-LABEL: @foou
	; CHECK-DAG: fctiwuz [[REG2:[0-9]+]], 1			; CHECK: friz 1, 1
	; CHECK-DAG: addi [[REG1:[0-9]+]], 1,
	; CHECK: stfiwx [[REG2]], 0, [[REG1]]
	; CHECK: lfiwzx [[REG3:[0-9]+]], 0, [[REG1]]
	; CHECK: fcfidus 1, [[REG3]]
	; CHECK: blr			; CHECK: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define double @fooud(double %X) #0 {			define double @fooud(double %X) #0 {
	entry:			entry:
	%conv = fptoui double %X to i32			%conv = fptoui double %X to i32
	%conv1 = uitofp i32 %conv to double			%conv1 = uitofp i32 %conv to double
	ret double %conv1			ret double %conv1

	; CHECK-LABEL: @fooud			; CHECK-LABEL: @fooud
	; CHECK-DAG: fctiwuz [[REG2:[0-9]+]], 1			; CHECK: friz 1, 1
	; CHECK-DAG: addi [[REG1:[0-9]+]], 1,
	; CHECK: stfiwx [[REG2]], 0, [[REG1]]
	; CHECK: lfiwzx [[REG3:[0-9]+]], 0, [[REG1]]
	; CHECK: fcfidu 1, [[REG3]]
	; CHECK: blr			; CHECK: blr
	}			}

	attributes #0 = { nounwind readonly }			attributes #0 = { nounwind readonly }

test/CodeGen/X86/2011-10-19-widen_vselect.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	}			}

	define void @full_test() {			define void @full_test() {
	; X32-LABEL: full_test:			; X32-LABEL: full_test:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: subl $60, %esp			; X32-NEXT: subl $60, %esp
	; X32-NEXT: .cfi_def_cfa_offset 64			; X32-NEXT: .cfi_def_cfa_offset 64
	; X32-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; X32-NEXT: cvttps2dq %xmm2, %xmm0			; X32-NEXT: roundps $11, %xmm2, %xmm1
	; X32-NEXT: cvtdq2ps %xmm0, %xmm1
	; X32-NEXT: xorps %xmm0, %xmm0			; X32-NEXT: xorps %xmm0, %xmm0
	; X32-NEXT: cmpltps %xmm2, %xmm0			; X32-NEXT: cmpltps %xmm2, %xmm0
	; X32-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>			; X32-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>
	; X32-NEXT: addps %xmm1, %xmm3			; X32-NEXT: addps %xmm1, %xmm3
	; X32-NEXT: movaps %xmm1, %xmm4			; X32-NEXT: movaps %xmm1, %xmm4
	; X32-NEXT: blendvps %xmm0, %xmm3, %xmm4			; X32-NEXT: blendvps %xmm0, %xmm3, %xmm4
	; X32-NEXT: cmpeqps %xmm2, %xmm1			; X32-NEXT: cmpeqps %xmm2, %xmm1
	; X32-NEXT: movaps %xmm1, %xmm0			; X32-NEXT: movaps %xmm1, %xmm0
	; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4			; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4
	; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movshdup {{.*#+}} xmm0 = xmm4[1,1,3,3]			; X32-NEXT: movshdup {{.*#+}} xmm0 = xmm4[1,1,3,3]
	; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)
	; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)
	; X32-NEXT: addl $60, %esp			; X32-NEXT: addl $60, %esp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: full_test:			; X64-LABEL: full_test:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; X64-NEXT: cvttps2dq %xmm2, %xmm0			; X64-NEXT: roundps $11, %xmm2, %xmm1
	; X64-NEXT: cvtdq2ps %xmm0, %xmm1
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: cmpltps %xmm2, %xmm0			; X64-NEXT: cmpltps %xmm2, %xmm0
	; X64-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>			; X64-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>
	; X64-NEXT: addps %xmm1, %xmm3			; X64-NEXT: addps %xmm1, %xmm3
	; X64-NEXT: movaps %xmm1, %xmm4			; X64-NEXT: movaps %xmm1, %xmm4
	; X64-NEXT: blendvps %xmm0, %xmm3, %xmm4			; X64-NEXT: blendvps %xmm0, %xmm3, %xmm4
	; X64-NEXT: cmpeqps %xmm2, %xmm1			; X64-NEXT: cmpeqps %xmm2, %xmm1
	; X64-NEXT: movaps %xmm1, %xmm0			; X64-NEXT: movaps %xmm1, %xmm0
	Show All 25 Lines

test/CodeGen/X86/ftrunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE41			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE41
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1

	define float @trunc_unsigned_f32(float %x) nounwind {			define float @trunc_unsigned_f32(float %x) nounwind {
	; SSE2-LABEL: trunc_unsigned_f32:			; SSE2-LABEL: trunc_unsigned_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttss2si %xmm0, %rax			; SSE2-NEXT: cvttss2si %xmm0, %rax
	; SSE2-NEXT: movl %eax, %eax			; SSE2-NEXT: movl %eax, %eax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2ssq %rax, %xmm0			; SSE2-NEXT: cvtsi2ssq %rax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_f32:			; SSE41-LABEL: trunc_unsigned_f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttss2si %xmm0, %rax			; SSE41-NEXT: roundss $11, %xmm0, %xmm0
	; SSE41-NEXT: movl %eax, %eax
	; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: cvtsi2ssq %rax, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_f32:			; AVX1-LABEL: trunc_unsigned_f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vcvttss2si %xmm0, %rax			; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: movl %eax, %eax
	; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui float %x to i32			%i = fptoui float %x to i32
	%r = uitofp i32 %i to float			%r = uitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_unsigned_f64(double %x) nounwind {			define double @trunc_unsigned_f64(double %x) nounwind {
	; SSE2-LABEL: trunc_unsigned_f64:			; SSE2-LABEL: trunc_unsigned_f64:
	Show All 11 Lines
	; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]			; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
	; SSE2-NEXT: subpd {{.*}}(%rip), %xmm1			; SSE2-NEXT: subpd {{.*}}(%rip), %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[2,3,0,1]			; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[2,3,0,1]
	; SSE2-NEXT: addpd %xmm1, %xmm0			; SSE2-NEXT: addpd %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_f64:			; SSE41-LABEL: trunc_unsigned_f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE41-NEXT: roundsd $11, %xmm0, %xmm0
	; SSE41-NEXT: movapd %xmm0, %xmm2
	; SSE41-NEXT: subsd %xmm1, %xmm2
	; SSE41-NEXT: cvttsd2si %xmm2, %rax
	; SSE41-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; SSE41-NEXT: xorq %rax, %rcx
	; SSE41-NEXT: cvttsd2si %xmm0, %rax
	; SSE41-NEXT: ucomisd %xmm1, %xmm0
	; SSE41-NEXT: cmovaeq %rcx, %rax
	; SSE41-NEXT: movq %rax, %xmm0
	; SSE41-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
	; SSE41-NEXT: subpd {{.*}}(%rip), %xmm0
	; SSE41-NEXT: haddpd %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_f64:			; AVX1-LABEL: trunc_unsigned_f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: vcvttsd2si %xmm2, %rax
	; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; AVX1-NEXT: xorq %rax, %rcx
	; AVX1-NEXT: vcvttsd2si %xmm0, %rax
	; AVX1-NEXT: vucomisd %xmm1, %xmm0
	; AVX1-NEXT: cmovaeq %rcx, %rax
	; AVX1-NEXT: vmovq %rax, %xmm0
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
	; AVX1-NEXT: vsubpd {{.*}}(%rip), %xmm0, %xmm0
	; AVX1-NEXT: vhaddpd %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui double %x to i64			%i = fptoui double %x to i64
	%r = uitofp i64 %i to double			%r = uitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) nounwind {			define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) nounwind {
	; SSE2-LABEL: trunc_unsigned_v4f32:			; SSE2-LABEL: trunc_unsigned_v4f32:
	Show All 21 Lines
	; SSE2-NEXT: por {{.*}}(%rip), %xmm1			; SSE2-NEXT: por {{.*}}(%rip), %xmm1
	; SSE2-NEXT: addps {{.*}}(%rip), %xmm1			; SSE2-NEXT: addps {{.*}}(%rip), %xmm1
	; SSE2-NEXT: addps %xmm0, %xmm1			; SSE2-NEXT: addps %xmm0, %xmm1
	; SSE2-NEXT: movaps %xmm1, %xmm0			; SSE2-NEXT: movaps %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_v4f32:			; SSE41-LABEL: trunc_unsigned_v4f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]			; SSE41-NEXT: roundps $11, %xmm0, %xmm0
	; SSE41-NEXT: cvttss2si %xmm1, %rax
	; SSE41-NEXT: cvttss2si %xmm0, %rcx
	; SSE41-NEXT: movd %ecx, %xmm1
	; SSE41-NEXT: pinsrd $1, %eax, %xmm1
	; SSE41-NEXT: movaps %xmm0, %xmm2
	; SSE41-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
	; SSE41-NEXT: cvttss2si %xmm2, %rax
	; SSE41-NEXT: pinsrd $2, %eax, %xmm1
	; SSE41-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]
	; SSE41-NEXT: cvttss2si %xmm0, %rax
	; SSE41-NEXT: pinsrd $3, %eax, %xmm1
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [1258291200,1258291200,1258291200,1258291200]
	; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0],xmm0[1],xmm1[2],xmm0[3],xmm1[4],xmm0[5],xmm1[6],xmm0[7]
	; SSE41-NEXT: psrld $16, %xmm1
	; SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0],mem[1],xmm1[2],mem[3],xmm1[4],mem[5],xmm1[6],mem[7]
	; SSE41-NEXT: addps {{.*}}(%rip), %xmm1
	; SSE41-NEXT: addps %xmm0, %xmm1
	; SSE41-NEXT: movaps %xmm1, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_v4f32:			; AVX1-LABEL: trunc_unsigned_v4f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]			; AVX1-NEXT: vroundps $11, %xmm0, %xmm0
	; AVX1-NEXT: vcvttss2si %xmm1, %rax
	; AVX1-NEXT: vcvttss2si %xmm0, %rcx
	; AVX1-NEXT: vmovd %ecx, %xmm1
	; AVX1-NEXT: vpinsrd $1, %eax, %xmm1, %xmm1
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
	; AVX1-NEXT: vcvttss2si %xmm2, %rax
	; AVX1-NEXT: vpinsrd $2, %eax, %xmm1, %xmm1
	; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
	; AVX1-NEXT: vcvttss2si %xmm0, %rax
	; AVX1-NEXT: vpinsrd $3, %eax, %xmm1, %xmm0
	; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; AVX1-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0
	; AVX1-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <4 x float> %x to <4 x i32>			%i = fptoui <4 x float> %x to <4 x i32>
	%r = uitofp <4 x i32> %i to <4 x float>			%r = uitofp <4 x i32> %i to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) nounwind {			define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) nounwind {
	; SSE2-LABEL: trunc_unsigned_v2f64:			; SSE2-LABEL: trunc_unsigned_v2f64:
	Show All 28 Lines
	; SSE2-NEXT: subpd %xmm3, %xmm1			; SSE2-NEXT: subpd %xmm3, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
	; SSE2-NEXT: addpd %xmm1, %xmm2			; SSE2-NEXT: addpd %xmm1, %xmm2
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]			; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_v2f64:			; SSE41-LABEL: trunc_unsigned_v2f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movaps %xmm0, %xmm1			; SSE41-NEXT: roundpd $11, %xmm0, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; SSE41-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; SSE41-NEXT: movaps %xmm1, %xmm3
	; SSE41-NEXT: subsd %xmm2, %xmm3
	; SSE41-NEXT: cvttsd2si %xmm3, %rax
	; SSE41-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; SSE41-NEXT: xorq %rcx, %rax
	; SSE41-NEXT: cvttsd2si %xmm1, %rdx
	; SSE41-NEXT: ucomisd %xmm2, %xmm1
	; SSE41-NEXT: cmovaeq %rax, %rdx
	; SSE41-NEXT: movaps %xmm0, %xmm1
	; SSE41-NEXT: subsd %xmm2, %xmm1
	; SSE41-NEXT: cvttsd2si %xmm1, %rax
	; SSE41-NEXT: xorq %rcx, %rax
	; SSE41-NEXT: cvttsd2si %xmm0, %rcx
	; SSE41-NEXT: ucomisd %xmm2, %xmm0
	; SSE41-NEXT: cmovaeq %rax, %rcx
	; SSE41-NEXT: movq %rcx, %xmm0
	; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [1127219200,1160773632,0,0]
	; SSE41-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE41-NEXT: movapd {{.*#+}} xmm2 = [4.503600e+15,1.934281e+25]
	; SSE41-NEXT: subpd %xmm2, %xmm0
	; SSE41-NEXT: movq %rdx, %xmm3
	; SSE41-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]
	; SSE41-NEXT: subpd %xmm2, %xmm3
	; SSE41-NEXT: haddpd %xmm3, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_v2f64:			; AVX1-LABEL: trunc_unsigned_v2f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]			; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0
	; AVX1-NEXT: vmovsd {{.*#+}} xmm2 = mem[0],zero
	; AVX1-NEXT: vsubsd %xmm2, %xmm1, %xmm3
	; AVX1-NEXT: vcvttsd2si %xmm3, %rax
	; AVX1-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; AVX1-NEXT: xorq %rcx, %rax
	; AVX1-NEXT: vcvttsd2si %xmm1, %rdx
	; AVX1-NEXT: vucomisd %xmm2, %xmm1
	; AVX1-NEXT: cmovaeq %rax, %rdx
	; AVX1-NEXT: vsubsd %xmm2, %xmm0, %xmm1
	; AVX1-NEXT: vcvttsd2si %xmm1, %rax
	; AVX1-NEXT: xorq %rcx, %rax
	; AVX1-NEXT: vcvttsd2si %xmm0, %rcx
	; AVX1-NEXT: vucomisd %xmm2, %xmm0
	; AVX1-NEXT: cmovaeq %rax, %rcx
	; AVX1-NEXT: vmovq %rcx, %xmm0
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1127219200,1160773632,0,0]
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; AVX1-NEXT: vmovapd {{.*#+}} xmm2 = [4.503600e+15,1.934281e+25]
	; AVX1-NEXT: vsubpd %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vmovq %rdx, %xmm3
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]
	; AVX1-NEXT: vsubpd %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vhaddpd %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <2 x double> %x to <2 x i64>			%i = fptoui <2 x double> %x to <2 x i64>
	%r = uitofp <2 x i64> %i to <2 x double>			%r = uitofp <2 x i64> %i to <2 x double>
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) nounwind {			define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) nounwind {
	; SSE2-LABEL: trunc_unsigned_v4f64:			; SSE2-LABEL: trunc_unsigned_v4f64:
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: subpd %xmm3, %xmm4			; SSE2-NEXT: subpd %xmm3, %xmm4
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]			; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]
	; SSE2-NEXT: addpd %xmm4, %xmm2			; SSE2-NEXT: addpd %xmm4, %xmm2
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]			; SSE2-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_v4f64:			; SSE41-LABEL: trunc_unsigned_v4f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: movaps %xmm1, %xmm3			; SSE41-NEXT: roundpd $11, %xmm0, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm3 = xmm1[1],xmm3[1]			; SSE41-NEXT: roundpd $11, %xmm1, %xmm1
	; SSE41-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; SSE41-NEXT: movaps %xmm3, %xmm4
	; SSE41-NEXT: subsd %xmm2, %xmm4
	; SSE41-NEXT: cvttsd2si %xmm4, %rcx
	; SSE41-NEXT: movabsq $-9223372036854775808, %rdx # imm = 0x8000000000000000
	; SSE41-NEXT: xorq %rdx, %rcx
	; SSE41-NEXT: cvttsd2si %xmm3, %rax
	; SSE41-NEXT: ucomisd %xmm2, %xmm3
	; SSE41-NEXT: cmovaeq %rcx, %rax
	; SSE41-NEXT: movaps %xmm1, %xmm3
	; SSE41-NEXT: subsd %xmm2, %xmm3
	; SSE41-NEXT: cvttsd2si %xmm3, %rsi
	; SSE41-NEXT: xorq %rdx, %rsi
	; SSE41-NEXT: cvttsd2si %xmm1, %rcx
	; SSE41-NEXT: ucomisd %xmm2, %xmm1
	; SSE41-NEXT: cmovaeq %rsi, %rcx
	; SSE41-NEXT: movaps %xmm0, %xmm1
	; SSE41-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; SSE41-NEXT: movaps %xmm1, %xmm3
	; SSE41-NEXT: subsd %xmm2, %xmm3
	; SSE41-NEXT: cvttsd2si %xmm3, %rsi
	; SSE41-NEXT: xorq %rdx, %rsi
	; SSE41-NEXT: cvttsd2si %xmm1, %rdi
	; SSE41-NEXT: ucomisd %xmm2, %xmm1
	; SSE41-NEXT: cmovaeq %rsi, %rdi
	; SSE41-NEXT: movaps %xmm0, %xmm1
	; SSE41-NEXT: subsd %xmm2, %xmm1
	; SSE41-NEXT: cvttsd2si %xmm1, %rsi
	; SSE41-NEXT: xorq %rdx, %rsi
	; SSE41-NEXT: cvttsd2si %xmm0, %rdx
	; SSE41-NEXT: ucomisd %xmm2, %xmm0
	; SSE41-NEXT: cmovaeq %rsi, %rdx
	; SSE41-NEXT: movq %rdx, %xmm0
	; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [1127219200,1160773632,0,0]
	; SSE41-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; SSE41-NEXT: movapd {{.*#+}} xmm3 = [4.503600e+15,1.934281e+25]
	; SSE41-NEXT: subpd %xmm3, %xmm0
	; SSE41-NEXT: movq %rdi, %xmm1
	; SSE41-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; SSE41-NEXT: subpd %xmm3, %xmm1
	; SSE41-NEXT: haddpd %xmm1, %xmm0
	; SSE41-NEXT: movq %rcx, %xmm1
	; SSE41-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
	; SSE41-NEXT: subpd %xmm3, %xmm1
	; SSE41-NEXT: movq %rax, %xmm4
	; SSE41-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
	; SSE41-NEXT: subpd %xmm3, %xmm4
	; SSE41-NEXT: haddpd %xmm4, %xmm1
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_v4f64:			; AVX1-LABEL: trunc_unsigned_v4f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]			; AVX1-NEXT: vroundpd $11, %ymm0, %ymm0
	; AVX1-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm3
	; AVX1-NEXT: vcvttsd2si %xmm3, %rcx
	; AVX1-NEXT: movabsq $-9223372036854775808, %rdx # imm = 0x8000000000000000
	; AVX1-NEXT: xorq %rdx, %rcx
	; AVX1-NEXT: vcvttsd2si %xmm2, %rax
	; AVX1-NEXT: vucomisd %xmm1, %xmm2
	; AVX1-NEXT: cmovaeq %rcx, %rax
	; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: vcvttsd2si %xmm2, %rcx
	; AVX1-NEXT: xorq %rdx, %rcx
	; AVX1-NEXT: vcvttsd2si %xmm0, %rsi
	; AVX1-NEXT: vucomisd %xmm1, %xmm0
	; AVX1-NEXT: cmovaeq %rcx, %rsi
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm0[1,0]
	; AVX1-NEXT: vsubsd %xmm1, %xmm2, %xmm3
	; AVX1-NEXT: vcvttsd2si %xmm3, %rcx
	; AVX1-NEXT: xorq %rdx, %rcx
	; AVX1-NEXT: vcvttsd2si %xmm2, %rdi
	; AVX1-NEXT: vucomisd %xmm1, %xmm2
	; AVX1-NEXT: cmovaeq %rcx, %rdi
	; AVX1-NEXT: vsubsd %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: vcvttsd2si %xmm2, %rcx
	; AVX1-NEXT: xorq %rdx, %rcx
	; AVX1-NEXT: vcvttsd2si %xmm0, %rdx
	; AVX1-NEXT: vucomisd %xmm1, %xmm0
	; AVX1-NEXT: cmovaeq %rcx, %rdx
	; AVX1-NEXT: vmovq %rdx, %xmm0
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1127219200,1160773632,0,0]
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; AVX1-NEXT: vmovapd {{.*#+}} xmm2 = [4.503600e+15,1.934281e+25]
	; AVX1-NEXT: vsubpd %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vmovq %rdi, %xmm3
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]
	; AVX1-NEXT: vsubpd %xmm2, %xmm3, %xmm3
	; AVX1-NEXT: vhaddpd %xmm3, %xmm0, %xmm0
	; AVX1-NEXT: vmovq %rsi, %xmm3
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]
	; AVX1-NEXT: vsubpd %xmm2, %xmm3, %xmm3
	; AVX1-NEXT: vmovq %rax, %xmm4
	; AVX1-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm4[0],xmm1[0],xmm4[1],xmm1[1]
	; AVX1-NEXT: vsubpd %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vhaddpd %xmm1, %xmm3, %xmm1
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <4 x double> %x to <4 x i64>			%i = fptoui <4 x double> %x to <4 x i64>
	%r = uitofp <4 x i64> %i to <4 x double>			%r = uitofp <4 x i64> %i to <4 x double>
	ret <4 x double> %r			ret <4 x double> %r
	}			}

	define float @trunc_signed_f32(float %x) nounwind {			define float @trunc_signed_f32(float %x) nounwind {
	; SSE2-LABEL: trunc_signed_f32:			; SSE2-LABEL: trunc_signed_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttss2si %xmm0, %eax			; SSE2-NEXT: cvttss2si %xmm0, %eax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2ssl %eax, %xmm0			; SSE2-NEXT: cvtsi2ssl %eax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_f32:			; SSE41-LABEL: trunc_signed_f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttss2si %xmm0, %eax			; SSE41-NEXT: roundss $11, %xmm0, %xmm0
	; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: cvtsi2ssl %eax, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_f32:			; AVX1-LABEL: trunc_signed_f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vcvttss2si %xmm0, %eax			; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi float %x to i32			%i = fptosi float %x to i32
	%r = sitofp i32 %i to float			%r = sitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_signed_f64(double %x) nounwind {			define double @trunc_signed_f64(double %x) nounwind {
	; SSE2-LABEL: trunc_signed_f64:			; SSE2-LABEL: trunc_signed_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm0, %rax			; SSE2-NEXT: cvttsd2si %xmm0, %rax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0			; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_f64:			; SSE41-LABEL: trunc_signed_f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttsd2si %xmm0, %rax			; SSE41-NEXT: roundsd $11, %xmm0, %xmm0
	; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_f64:			; AVX1-LABEL: trunc_signed_f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vcvttsd2si %xmm0, %rax			; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi double %x to i64			%i = fptosi double %x to i64
	%r = sitofp i64 %i to double			%r = sitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	define <4 x float> @trunc_signed_v4f32(<4 x float> %x) nounwind {			define <4 x float> @trunc_signed_v4f32(<4 x float> %x) nounwind {
	; SSE2-LABEL: trunc_signed_v4f32:			; SSE2-LABEL: trunc_signed_v4f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttps2dq %xmm0, %xmm0			; SSE2-NEXT: cvttps2dq %xmm0, %xmm0
	; SSE2-NEXT: cvtdq2ps %xmm0, %xmm0			; SSE2-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_v4f32:			; SSE41-LABEL: trunc_signed_v4f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttps2dq %xmm0, %xmm0			; SSE41-NEXT: roundps $11, %xmm0, %xmm0
	; SSE41-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_v4f32:			; AVX1-LABEL: trunc_signed_v4f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0			; AVX1-NEXT: vroundps $11, %xmm0, %xmm0
	; AVX1-NEXT: vcvtdq2ps %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi <4 x float> %x to <4 x i32>			%i = fptosi <4 x float> %x to <4 x i32>
	%r = sitofp <4 x i32> %i to <4 x float>			%r = sitofp <4 x i32> %i to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @trunc_signed_v2f64(<2 x double> %x) nounwind {			define <2 x double> @trunc_signed_v2f64(<2 x double> %x) nounwind {
	; SSE2-LABEL: trunc_signed_v2f64:			; SSE2-LABEL: trunc_signed_v2f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm0, %rax			; SSE2-NEXT: cvttsd2si %xmm0, %rax
	; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE2-NEXT: cvttsd2si %xmm0, %rcx			; SSE2-NEXT: cvttsd2si %xmm0, %rcx
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0			; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rcx, %xmm1			; SSE2-NEXT: cvtsi2sdq %rcx, %xmm1
	; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_v2f64:			; SSE41-LABEL: trunc_signed_v2f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttsd2si %xmm0, %rax			; SSE41-NEXT: roundpd $11, %xmm0, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE41-NEXT: cvttsd2si %xmm0, %rcx
	; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE41-NEXT: cvtsi2sdq %rcx, %xmm1
	; SSE41-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_v2f64:			; AVX1-LABEL: trunc_signed_v2f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]			; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0
	; AVX1-NEXT: vcvttsd2si %xmm1, %rax
	; AVX1-NEXT: vcvttsd2si %xmm0, %rcx
	; AVX1-NEXT: vcvtsi2sdq %rcx, %xmm2, %xmm0
	; AVX1-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm1
	; AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi <2 x double> %x to <2 x i64>			%i = fptosi <2 x double> %x to <2 x i64>
	%r = sitofp <2 x i64> %i to <2 x double>			%r = sitofp <2 x i64> %i to <2 x double>
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x double> @trunc_signed_v4f64(<4 x double> %x) nounwind {			define <4 x double> @trunc_signed_v4f64(<4 x double> %x) nounwind {
	; SSE2-LABEL: trunc_signed_v4f64:			; SSE2-LABEL: trunc_signed_v4f64:
	Show All 12 Lines
	; SSE2-NEXT: xorps %xmm1, %xmm1			; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm1			; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rcx, %xmm2			; SSE2-NEXT: cvtsi2sdq %rcx, %xmm2
	; SSE2-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]			; SSE2-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_v4f64:			; SSE41-LABEL: trunc_signed_v4f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: cvttsd2si %xmm1, %rax			; SSE41-NEXT: roundpd $11, %xmm0, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE41-NEXT: roundpd $11, %xmm1, %xmm1
	; SSE41-NEXT: cvttsd2si %xmm1, %rcx
	; SSE41-NEXT: cvttsd2si %xmm0, %rdx
	; SSE41-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE41-NEXT: cvttsd2si %xmm0, %rsi
	; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: cvtsi2sdq %rdx, %xmm0
	; SSE41-NEXT: xorps %xmm1, %xmm1
	; SSE41-NEXT: cvtsi2sdq %rsi, %xmm1
	; SSE41-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE41-NEXT: xorps %xmm1, %xmm1
	; SSE41-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE41-NEXT: cvtsi2sdq %rcx, %xmm2
	; SSE41-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_v4f64:			; AVX1-LABEL: trunc_signed_v4f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]			; AVX1-NEXT: vroundpd $11, %ymm0, %ymm0
	; AVX1-NEXT: vcvttsd2si %xmm1, %rax
	; AVX1-NEXT: vcvttsd2si %xmm0, %rcx
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
	; AVX1-NEXT: vcvttsd2si %xmm1, %rdx
	; AVX1-NEXT: vcvttsd2si %xmm0, %rsi
	; AVX1-NEXT: vcvtsi2sdq %rsi, %xmm2, %xmm0
	; AVX1-NEXT: vcvtsi2sdq %rdx, %xmm2, %xmm1
	; AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: vcvtsi2sdq %rcx, %xmm2, %xmm1
	; AVX1-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2
	; AVX1-NEXT: vmovlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi <4 x double> %x to <4 x i64>			%i = fptosi <4 x double> %x to <4 x i64>
	%r = sitofp <4 x i64> %i to <4 x double>			%r = sitofp <4 x i64> %i to <4 x double>
	ret <4 x double> %r			ret <4 x double> %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] (float)((int) f) --> ftrunc (PR36617)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 139871

include/llvm/CodeGen/ISDOpcodes.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/ftrunc.ll

test/CodeGen/PowerPC/fp-int128-fp-combine.ll

test/CodeGen/PowerPC/fp-to-int-to-fp.ll

test/CodeGen/PowerPC/no-extra-fp-conv-ldst.ll

test/CodeGen/X86/2011-10-19-widen_vselect.ll

test/CodeGen/X86/ftrunc.ll

[DAGCombine] (float)((int) f) --> ftrunc (PR36617)
ClosedPublic