This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/5
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
ftrunc.ll
-
PowerPC/
-
fp-int128-fp-combine.ll
-
fp-to-int-to-fp.ll
-
ftrunc-vec.ll
-
no-extra-fp-conv-ldst.ll
-
X86/
-
2011-10-19-widen_vselect.ll
-
avx-cvttp2si.ll
-
ftrunc.ll
-
sse-cvttp2si.ll

Differential D48085

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
ClosedPublic

Authored by spatel on Jun 12 2018, 10:39 AM.

Download Raw Diff

Details

Reviewers

rdhindsa
nemanjai
hans
dmgreen
scanon
echristo
thakis
efriedma
craig.topper
RKSimon
javed.absar
lebedev.ri

Commits

rGd052de856d83: [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
rL335761: [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros

Summary

As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option.

Diff Detail

Event Timeline

spatel created this revision.Jun 12 2018, 10:39 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 12 2018, 10:39 AM

Herald added a subscriber: mcrosier. · View Herald Transcript

lebedev.ri added a subscriber: lebedev.ri.Jun 12 2018, 10:49 AM

lebedev.ri added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11049–11050	Couldn't we do `and (ftrunc %x), (-1 >> 1)`, i.e. unset the sign bit? Or is that worse than not transforming to `ftrunc` in the first place?

spatel added inline comments.Jun 12 2018, 11:01 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11049–11050	You'd want to keep this in FP, so I think that would be: fabs (ftrunc x) And that could be better than the alternative in some cases. See D44909 for what this looks like without using ftrunc. If the sequence is more than 2 converts, fabs is likely a winner. But given the fallout from the original patch, I expect this is the patch most people would prefer. :) We'll avoid the UB headaches for most code, and code that cares about FP perf likely has some loosened FP constraints anyway.

mcberg2017 added a subscriber: mcberg2017.Jun 12 2018, 11:24 AM

lebedev.ri added inline comments.Jun 12 2018, 11:52 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11049–11050	You'd want to keep this in FP, so I think that would be: fabs (ftrunc x) Right, right. Otherwise you might need to leave fp domain, etc etc. But given the fallout from the original patch, I expect this is the patch most people would prefer. :) That cat is already out of the bag though, and modulo this edge-case, the transform is valid i think. Backtracking like this, while not for no reason, may be viewed as acceptance of views of some who think UB is bad and lack of UB [in code] should not be assumed by compilers. But this is purely my opinion.

spatel added inline comments.Jun 12 2018, 1:04 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11049–11050	Depends on your perspective. For brave (much appreciated!) projects that track LLVM trunk, yes they've already suffered/benefited. But most potential customers won't see this change until the next major release. But this discussion is independent of this particular patch. We have a miscompile that must be fixed. Salvaging perf is secondary, and I don't think we can do it universally - ie, converting 2 casts to fabs+ftrunc or select+fcmp+fabs+ftrunc requires a TLI hook, so that's definitely a bigger patch than what we have here currently.

spatel mentioned this in D47909: Utilize new SDNode flag functionality to expand current support for fadd.Jun 12 2018, 2:16 PM

Ping.

Should this adjust the ReleaseNotes?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11049–11050	Could you please at least add a `FIXME`? :)

In D48085#1136860, @lebedev.ri wrote:

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

Patch updated:
Added a 'TODO' comment about using a FABS-based sequence if we can't ignore -0.0.

In D48085#1136900, @spatel wrote:

In D48085#1136860, @lebedev.ri wrote:

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.
But now it will only be done in presence of no-signed-zeros-fp-math attribute.
Which is controlled by -ffast-math (or maybe some more fine-grained option, too?)

In D48085#1136910, @lebedev.ri wrote:

In D48085#1136900, @spatel wrote:

In D48085#1136860, @lebedev.ri wrote:

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.
But now it will only be done in presence of no-signed-zeros-fp-math attribute.
Which is controlled by -ffast-math (or maybe some more fine-grained option, too?)

Right - it should still be allowed with "-fno-signed-zeros" at a minimum.

If we change the docs to provide the exact details of the potential optimization, then we have to adjust the docs anytime that changes (eg, if we implement the fabs trick for some subset of data types on some subset targets). So my preference is to not document the implementation at that level, but if you think that the text is misleading, I'll add a note.

In D48085#1136910, @lebedev.ri wrote:

In D48085#1136900, @spatel wrote:

In D48085#1136860, @lebedev.ri wrote:

Should this adjust the ReleaseNotes?

The docs provide extra warning/workaround, and this change doesn't affect that language IMO (there's less chance we're going to break code, but we don't have to disclose that).

For reference, here are links to the docs that we added with the previous patches (last updated with D46236 I think):
https://clang.llvm.org/docs/ReleaseNotes.html#new-compiler-flags
https://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
http://llvm.org/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Let me know if you see anything that can be improved.

The text in first and last links suggests that such an optimization is always being done.

One more point here: if it sounds like it's always done, then that's already wrong.

We only do the transformation when the target has a legal FTRUNC op, so some fraction of targets are never affected by this transform regardless of fast-math. Eg, x86 doesn't have roundXX before SSE4.1.

Ping * 2.

I still think the docs need some adjustment.
And not for all arches the there are full set of tests (i.e. one with "no-signed-zeros-fp-math"="true", one without).
But this is a regression, so i'd guess that can be iterated later on.

It would be best if someone else could agree/disagree, but other than that, LG.

This revision is now accepted and ready to land.Jun 26 2018, 8:39 AM

In D48085#1143602, @lebedev.ri wrote:

I still think the docs need some adjustment.
And not for all arches the there are full set of tests (i.e. one with "no-signed-zeros-fp-math"="true", one without).
But this is a regression, so i'd guess that can be iterated later on.

It would be best if someone else could agree/disagree, but other than that, LG.

Thanks! I'm not opposed to rewording the docs, but I'm also not sure how to adjust them to make it better rather than worse.

If anyone would like to take a shot at that, I'm happy to review (and as I said, I think any text tweaking should be independent of this code change...though others may disagree with that assessment).

Closed by commit rL335761: [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros (authored by spatel). · Explain WhyJun 27 2018, 11:21 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D77895: [x86] use vector instructions to lower FP->int->FP casts.Apr 10 2020, 2:53 PM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

8 lines

test/

CodeGen/

AArch64/

ftrunc.ll

10 lines

PowerPC/

fp-int128-fp-combine.ll

29 lines

fp-to-int-to-fp.ll

2 lines

ftrunc-vec.ll

10 lines

no-extra-fp-conv-ldst.ll

2 lines

X86/

2011-10-19-widen_vselect.ll

6 lines

avx-cvttp2si.ll

10 lines

ftrunc.ll

23 lines

sse-cvttp2si.ll

26 lines

Diff 150979

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,039 Lines • ▼ Show 20 Lines	static SDValue foldFPToIntToFP(SDNode *N, SelectionDAG &DAG,
// unexpected results. Ie, programs may be relying on the platform-specific		// unexpected results. Ie, programs may be relying on the platform-specific
// undefined behavior when the float-to-int conversion overflows.		// undefined behavior when the float-to-int conversion overflows.
const Function &F = DAG.getMachineFunction().getFunction();		const Function &F = DAG.getMachineFunction().getFunction();
Attribute StrictOverflow = F.getFnAttribute("strict-float-cast-overflow");		Attribute StrictOverflow = F.getFnAttribute("strict-float-cast-overflow");
if (StrictOverflow.getValueAsString().equals("false"))		if (StrictOverflow.getValueAsString().equals("false"))
return SDValue();		return SDValue();

// We only do this if the target has legal ftrunc. Otherwise, we'd likely be		// We only do this if the target has legal ftrunc. Otherwise, we'd likely be
// replacing casts with a libcall.		// replacing casts with a libcall. We also must be allowed to ignore -0.0
		// because FTRUNC will return -0.0 for (-1.0, -0.0), but using integer
		// conversions would return +0.0.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Couldn't we do `and (ftrunc %x), (-1 >> 1)`, i.e. unset the sign bit? Or is that worse than not transforming to `ftrunc` in the first place? lebedev.ri: Couldn't we do `and (ftrunc %x), (-1 >> 1)`, i.e. unset the sign bit? Or is that worse than not…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions You'd want to keep this in FP, so I think that would be: fabs (ftrunc x) And that could be better than the alternative in some cases. See D44909 for what this looks like without using ftrunc. If the sequence is more than 2 converts, fabs is likely a winner. But given the fallout from the original patch, I expect this is the patch most people would prefer. :) We'll avoid the UB headaches for most code, and code that cares about FP perf likely has some loosened FP constraints anyway. spatel: You'd want to keep this in FP, so I think that would be: fabs (ftrunc x) And that could be…
		lebedev.riUnsubmitted Not Done Reply Inline Actions You'd want to keep this in FP, so I think that would be: fabs (ftrunc x) Right, right. Otherwise you might need to leave fp domain, etc etc. But given the fallout from the original patch, I expect this is the patch most people would prefer. :) That cat is already out of the bag though, and modulo this edge-case, the transform is valid i think. Backtracking like this, while not for no reason, may be viewed as acceptance of views of some who think UB is bad and lack of UB [in code] should not be assumed by compilers. But this is purely my opinion. lebedev.ri: > You'd want to keep this in FP, so I think that would be: > fabs (ftrunc x) Right, right.
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Depends on your perspective. For brave (much appreciated!) projects that track LLVM trunk, yes they've already suffered/benefited. But most potential customers won't see this change until the next major release. But this discussion is independent of this particular patch. We have a miscompile that must be fixed. Salvaging perf is secondary, and I don't think we can do it universally - ie, converting 2 casts to fabs+ftrunc or select+fcmp+fabs+ftrunc requires a TLI hook, so that's definitely a bigger patch than what we have here currently. spatel: Depends on your perspective. For brave (much appreciated!) projects that track LLVM trunk, yes…
		lebedev.riUnsubmitted Done Reply Inline Actions Could you please at least add a `FIXME`? :) lebedev.ri: Could you please at least add a `FIXME`? :)
		// FIXME: We should be able to use node-level FMF here.
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (!TLI.isOperationLegal(ISD::FTRUNC, VT))		if (!TLI.isOperationLegal(ISD::FTRUNC, VT) \|\|
		!DAG.getTarget().Options.NoSignedZerosFPMath)
return SDValue();		return SDValue();

// fptosi/fptoui round towards zero, so converting from FP to integer and		// fptosi/fptoui round towards zero, so converting from FP to integer and
// back is the same as an 'ftrunc': [us]itofp (fpto[us]i X) --> ftrunc X		// back is the same as an 'ftrunc': [us]itofp (fpto[us]i X) --> ftrunc X
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
if (N->getOpcode() == ISD::SINT_TO_FP && N0.getOpcode() == ISD::FP_TO_SINT &&		if (N->getOpcode() == ISD::SINT_TO_FP && N0.getOpcode() == ISD::FP_TO_SINT &&
N0.getOperand(0).getValueType() == VT)		N0.getOperand(0).getValueType() == VT)
return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));		return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));
▲ Show 20 Lines • Show All 7,139 Lines • Show Last 20 Lines

test/CodeGen/AArch64/ftrunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-unknown-unknown < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-unknown-unknown < %s \| FileCheck %s

	define float @trunc_unsigned_f32(float %x) {			define float @trunc_unsigned_f32(float %x) #0 {
	; CHECK-LABEL: trunc_unsigned_f32:			; CHECK-LABEL: trunc_unsigned_f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frintz s0, s0			; CHECK-NEXT: frintz s0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%i = fptoui float %x to i32			%i = fptoui float %x to i32
	%r = uitofp i32 %i to float			%r = uitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_unsigned_f64(double %x) {			define double @trunc_unsigned_f64(double %x) #0 {
	; CHECK-LABEL: trunc_unsigned_f64:			; CHECK-LABEL: trunc_unsigned_f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frintz d0, d0			; CHECK-NEXT: frintz d0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%i = fptoui double %x to i64			%i = fptoui double %x to i64
	%r = uitofp i64 %i to double			%r = uitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	define float @trunc_signed_f32(float %x) {			define float @trunc_signed_f32(float %x) #0 {
	; CHECK-LABEL: trunc_signed_f32:			; CHECK-LABEL: trunc_signed_f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frintz s0, s0			; CHECK-NEXT: frintz s0, s0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%i = fptosi float %x to i32			%i = fptosi float %x to i32
	%r = sitofp i32 %i to float			%r = sitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_signed_f64(double %x) {			define double @trunc_signed_f64(double %x) #0 {
	; CHECK-LABEL: trunc_signed_f64:			; CHECK-LABEL: trunc_signed_f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frintz d0, d0			; CHECK-NEXT: frintz d0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%i = fptosi double %x to i64			%i = fptosi double %x to i64
	%r = sitofp i64 %i to double			%r = sitofp i64 %i to double
	ret double %r			ret double %r
	}			}

				attributes #0 = { "no-signed-zeros-fp-math"="true" }

test/CodeGen/PowerPC/fp-int128-fp-combine.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s

	; xscvdpsxds should NOT be emitted, since it saturates the result down to i64.			; xscvdpsxds should NOT be emitted, since it saturates the result down to i64.
				; We can't use friz here because it may return -0.0 where the original code doesn't.

	define float @f_i128_f(float %v) {			define float @f_i128_f(float %v) {
	; CHECK-LABEL: f_i128_f:			; CHECK-LABEL: f_i128_f:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: mflr 0
				; CHECK-NEXT: std 0, 16(1)
				; CHECK-NEXT: stdu 1, -32(1)
				; CHECK-NEXT: .cfi_def_cfa_offset 32
				; CHECK-NEXT: .cfi_offset lr, 16
				; CHECK-NEXT: bl __fixsfti
				; CHECK-NEXT: nop
				; CHECK-NEXT: bl __floattisf
				; CHECK-NEXT: nop
				; CHECK-NEXT: addi 1, 1, 32
				; CHECK-NEXT: ld 0, 16(1)
				; CHECK-NEXT: mtlr 0
				; CHECK-NEXT: blr
				entry:
				%a = fptosi float %v to i128
				%b = sitofp i128 %a to float
				ret float %b
				}

				; NSZ, so it's safe to friz.

				define float @f_i128_fi_nsz(float %v) #0 {
				; CHECK-LABEL: f_i128_fi_nsz:
				; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: friz 1, 1			; CHECK-NEXT: friz 1, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%a = fptosi float %v to i128			%a = fptosi float %v to i128
	%b = sitofp i128 %a to float			%b = sitofp i128 %a to float
	ret float %b			ret float %b
	}			}

				attributes #0 = { "no-signed-zeros-fp-math"="true" }

test/CodeGen/PowerPC/fp-to-int-to-fp.ll

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	entry:
%conv1 = uitofp i64 %conv to double		%conv1 = uitofp i64 %conv to double
ret double %conv1		ret double %conv1

; FPCVT-LABEL: @fooudl		; FPCVT-LABEL: @fooudl
; FPCVT: friz 1, 1		; FPCVT: friz 1, 1
; FPCVT: blr		; FPCVT: blr
}		}

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone "no-signed-zeros-fp-math"="true" }

test/CodeGen/PowerPC/ftrunc-vec.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown -verify-machineinstrs < %s \| FileCheck %s

	define <4 x float> @truncf32(<4 x float> %a) {			define <4 x float> @truncf32(<4 x float> %a) #0 {
	; CHECK-LABEL: truncf32:			; CHECK-LABEL: truncf32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xvrspiz 34, 34			; CHECK-NEXT: xvrspiz 34, 34
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%t0 = fptosi <4 x float> %a to <4 x i32>			%t0 = fptosi <4 x float> %a to <4 x i32>
	%t1 = sitofp <4 x i32> %t0 to <4 x float>			%t1 = sitofp <4 x i32> %t0 to <4 x float>
	ret <4 x float> %t1			ret <4 x float> %t1
	}			}

	define <2 x double> @truncf64(<2 x double> %a) {			define <2 x double> @truncf64(<2 x double> %a) #0 {
	; CHECK-LABEL: truncf64:			; CHECK-LABEL: truncf64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xvrdpiz 34, 34			; CHECK-NEXT: xvrdpiz 34, 34
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%t0 = fptosi <2 x double> %a to <2 x i64>			%t0 = fptosi <2 x double> %a to <2 x i64>
	%t1 = sitofp <2 x i64> %t0 to <2 x double>			%t1 = sitofp <2 x i64> %t0 to <2 x double>
	ret <2 x double> %t1			ret <2 x double> %t1
	}			}

	define <4 x float> @truncf32u(<4 x float> %a) {			define <4 x float> @truncf32u(<4 x float> %a) #0 {
	; CHECK-LABEL: truncf32u:			; CHECK-LABEL: truncf32u:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xvrspiz 34, 34			; CHECK-NEXT: xvrspiz 34, 34
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%t0 = fptoui <4 x float> %a to <4 x i32>			%t0 = fptoui <4 x float> %a to <4 x i32>
	%t1 = uitofp <4 x i32> %t0 to <4 x float>			%t1 = uitofp <4 x i32> %t0 to <4 x float>
	ret <4 x float> %t1			ret <4 x float> %t1
	}			}

	define <2 x double> @truncf64u(<2 x double> %a) {			define <2 x double> @truncf64u(<2 x double> %a) #0 {
	; CHECK-LABEL: truncf64u:			; CHECK-LABEL: truncf64u:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xvrdpiz 34, 34			; CHECK-NEXT: xvrdpiz 34, 34
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%t0 = fptoui <2 x double> %a to <2 x i64>			%t0 = fptoui <2 x double> %a to <2 x i64>
	%t1 = uitofp <2 x i64> %t0 to <2 x double>			%t1 = uitofp <2 x i64> %t0 to <2 x double>
	ret <2 x double> %t1			ret <2 x double> %t1
	}			}

				attributes #0 = { "no-signed-zeros-fp-math"="true" }

test/CodeGen/PowerPC/no-extra-fp-conv-ldst.ll

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	entry:
%conv1 = uitofp i32 %conv to double		%conv1 = uitofp i32 %conv to double
ret double %conv1		ret double %conv1

; CHECK-LABEL: @fooud		; CHECK-LABEL: @fooud
; CHECK: friz 1, 1		; CHECK: friz 1, 1
; CHECK: blr		; CHECK: blr
}		}

attributes #0 = { nounwind readonly }		attributes #0 = { nounwind readonly "no-signed-zeros-fp-math"="true" }

test/CodeGen/X86/2011-10-19-widen_vselect.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	}			}

	define void @full_test() {			define void @full_test() {
	; X32-LABEL: full_test:			; X32-LABEL: full_test:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: subl $60, %esp			; X32-NEXT: subl $60, %esp
	; X32-NEXT: .cfi_def_cfa_offset 64			; X32-NEXT: .cfi_def_cfa_offset 64
	; X32-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; X32-NEXT: roundps $11, %xmm2, %xmm1			; X32-NEXT: cvttps2dq %xmm2, %xmm0
				; X32-NEXT: cvtdq2ps %xmm0, %xmm1
	; X32-NEXT: xorps %xmm0, %xmm0			; X32-NEXT: xorps %xmm0, %xmm0
	; X32-NEXT: cmpltps %xmm2, %xmm0			; X32-NEXT: cmpltps %xmm2, %xmm0
	; X32-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>			; X32-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>
	; X32-NEXT: addps %xmm1, %xmm3			; X32-NEXT: addps %xmm1, %xmm3
	; X32-NEXT: movaps %xmm1, %xmm4			; X32-NEXT: movaps %xmm1, %xmm4
	; X32-NEXT: blendvps %xmm0, %xmm3, %xmm4			; X32-NEXT: blendvps %xmm0, %xmm3, %xmm4
	; X32-NEXT: cmpeqps %xmm2, %xmm1			; X32-NEXT: cmpeqps %xmm2, %xmm1
	; X32-NEXT: movaps %xmm1, %xmm0			; X32-NEXT: movaps %xmm1, %xmm0
	; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4			; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4
	; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movshdup {{.*#+}} xmm0 = xmm4[1,1,3,3]			; X32-NEXT: movshdup {{.*#+}} xmm0 = xmm4[1,1,3,3]
	; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)
	; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm0, {{[0-9]+}}(%esp)
	; X32-NEXT: addl $60, %esp			; X32-NEXT: addl $60, %esp
	; X32-NEXT: .cfi_def_cfa_offset 4			; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: full_test:			; X64-LABEL: full_test:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; X64-NEXT: roundps $11, %xmm2, %xmm1			; X64-NEXT: cvttps2dq %xmm2, %xmm0
				; X64-NEXT: cvtdq2ps %xmm0, %xmm1
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: cmpltps %xmm2, %xmm0			; X64-NEXT: cmpltps %xmm2, %xmm0
	; X64-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>			; X64-NEXT: movaps {{.*#+}} xmm3 = <1,1,u,u>
	; X64-NEXT: addps %xmm1, %xmm3			; X64-NEXT: addps %xmm1, %xmm3
	; X64-NEXT: movaps %xmm1, %xmm4			; X64-NEXT: movaps %xmm1, %xmm4
	; X64-NEXT: blendvps %xmm0, %xmm3, %xmm4			; X64-NEXT: blendvps %xmm0, %xmm3, %xmm4
	; X64-NEXT: cmpeqps %xmm2, %xmm1			; X64-NEXT: cmpeqps %xmm2, %xmm1
	; X64-NEXT: movaps %xmm1, %xmm0			; X64-NEXT: movaps %xmm1, %xmm0
	Show All 25 Lines

test/CodeGen/X86/avx-cvttp2si.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl \| FileCheck %s --check-prefixes=AVX,AVX512			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl \| FileCheck %s --check-prefixes=AVX,AVX512

	; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751			; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751
	; We can't combine into 'round' instructions because the behavior is different for out-of-range values.			; We can't combine into 'round' instructions because the behavior is different for out-of-range values.

	declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>)			declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>)
	declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>)			declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>)

	define <8 x float> @float_to_int_to_float_mem_v8f32(<8 x float>* %p) {			define <8 x float> @float_to_int_to_float_mem_v8f32(<8 x float>* %p) #0 {
	; AVX-LABEL: float_to_int_to_float_mem_v8f32:			; AVX-LABEL: float_to_int_to_float_mem_v8f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundps $11, (%rdi), %ymm0			; AVX-NEXT: vroundps $11, (%rdi), %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <8 x float>, <8 x float>* %p, align 16			%x = load <8 x float>, <8 x float>* %p, align 16
	%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)			%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)
	%sitofp = sitofp <8 x i32> %fptosi to <8 x float>			%sitofp = sitofp <8 x i32> %fptosi to <8 x float>
	ret <8 x float> %sitofp			ret <8 x float> %sitofp
	}			}

	define <8 x float> @float_to_int_to_float_reg_v8f32(<8 x float> %x) {			define <8 x float> @float_to_int_to_float_reg_v8f32(<8 x float> %x) #0 {
	; AVX-LABEL: float_to_int_to_float_reg_v8f32:			; AVX-LABEL: float_to_int_to_float_reg_v8f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundps $11, %ymm0, %ymm0			; AVX-NEXT: vroundps $11, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)			%fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)
	%sitofp = sitofp <8 x i32> %fptosi to <8 x float>			%sitofp = sitofp <8 x i32> %fptosi to <8 x float>
	ret <8 x float> %sitofp			ret <8 x float> %sitofp
	}			}

	define <4 x double> @float_to_int_to_float_mem_v4f64(<4 x double>* %p) {			define <4 x double> @float_to_int_to_float_mem_v4f64(<4 x double>* %p) #0 {
	; AVX-LABEL: float_to_int_to_float_mem_v4f64:			; AVX-LABEL: float_to_int_to_float_mem_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundpd $11, (%rdi), %ymm0			; AVX-NEXT: vroundpd $11, (%rdi), %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <4 x double>, <4 x double>* %p, align 16			%x = load <4 x double>, <4 x double>* %p, align 16
	%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)			%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)
	%sitofp = sitofp <4 x i32> %fptosi to <4 x double>			%sitofp = sitofp <4 x i32> %fptosi to <4 x double>
	ret <4 x double> %sitofp			ret <4 x double> %sitofp
	}			}

	define <4 x double> @float_to_int_to_float_reg_v4f64(<4 x double> %x) {			define <4 x double> @float_to_int_to_float_reg_v4f64(<4 x double> %x) #0 {
	; AVX-LABEL: float_to_int_to_float_reg_v4f64:			; AVX-LABEL: float_to_int_to_float_reg_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundpd $11, %ymm0, %ymm0			; AVX-NEXT: vroundpd $11, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)			%fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)
	%sitofp = sitofp <4 x i32> %fptosi to <4 x double>			%sitofp = sitofp <4 x i32> %fptosi to <4 x double>
	ret <4 x double> %sitofp			ret <4 x double> %sitofp
	}			}

				attributes #0 = { "no-signed-zeros-fp-math"="true" }

test/CodeGen/X86/ftrunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE41			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE41
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1

	define float @trunc_unsigned_f32(float %x) nounwind {			define float @trunc_unsigned_f32(float %x) #0 {
	; SSE2-LABEL: trunc_unsigned_f32:			; SSE2-LABEL: trunc_unsigned_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttss2si %xmm0, %rax			; SSE2-NEXT: cvttss2si %xmm0, %rax
	; SSE2-NEXT: movl %eax, %eax			; SSE2-NEXT: movl %eax, %eax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2ssq %rax, %xmm0			; SSE2-NEXT: cvtsi2ssq %rax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_unsigned_f32:			; SSE41-LABEL: trunc_unsigned_f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: roundss $11, %xmm0, %xmm0			; SSE41-NEXT: roundss $11, %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_unsigned_f32:			; AVX1-LABEL: trunc_unsigned_f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui float %x to i32			%i = fptoui float %x to i32
	%r = uitofp i32 %i to float			%r = uitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_unsigned_f64(double %x) nounwind {			define double @trunc_unsigned_f64(double %x) #0 {
	; SSE2-LABEL: trunc_unsigned_f64:			; SSE2-LABEL: trunc_unsigned_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE2-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE2-NEXT: movapd %xmm0, %xmm2			; SSE2-NEXT: movapd %xmm0, %xmm2
	; SSE2-NEXT: subsd %xmm1, %xmm2			; SSE2-NEXT: subsd %xmm1, %xmm2
	; SSE2-NEXT: cvttsd2si %xmm2, %rax			; SSE2-NEXT: cvttsd2si %xmm2, %rax
	; SSE2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000			; SSE2-NEXT: movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
	; SSE2-NEXT: xorq %rax, %rcx			; SSE2-NEXT: xorq %rax, %rcx
	Show All 16 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui double %x to i64			%i = fptoui double %x to i64
	%r = uitofp i64 %i to double			%r = uitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) nounwind {			define <4 x float> @trunc_unsigned_v4f32(<4 x float> %x) #0 {
	; SSE2-LABEL: trunc_unsigned_v4f32:			; SSE2-LABEL: trunc_unsigned_v4f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1],xmm0[2,3]			; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1],xmm0[2,3]
	; SSE2-NEXT: cvttss2si %xmm1, %rax			; SSE2-NEXT: cvttss2si %xmm1, %rax
	; SSE2-NEXT: movd %eax, %xmm1			; SSE2-NEXT: movd %eax, %xmm1
	; SSE2-NEXT: movaps %xmm0, %xmm2			; SSE2-NEXT: movaps %xmm0, %xmm2
	; SSE2-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]			; SSE2-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
	Show All 26 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundps $11, %xmm0, %xmm0			; AVX1-NEXT: vroundps $11, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <4 x float> %x to <4 x i32>			%i = fptoui <4 x float> %x to <4 x i32>
	%r = uitofp <4 x i32> %i to <4 x float>			%r = uitofp <4 x i32> %i to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) nounwind {			define <2 x double> @trunc_unsigned_v2f64(<2 x double> %x) #0 {
	; SSE2-LABEL: trunc_unsigned_v2f64:			; SSE2-LABEL: trunc_unsigned_v2f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]			; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; SSE2-NEXT: movaps %xmm1, %xmm3			; SSE2-NEXT: movaps %xmm1, %xmm3
	; SSE2-NEXT: subsd %xmm2, %xmm3			; SSE2-NEXT: subsd %xmm2, %xmm3
	; SSE2-NEXT: cvttsd2si %xmm3, %rax			; SSE2-NEXT: cvttsd2si %xmm3, %rax
	Show All 33 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0			; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <2 x double> %x to <2 x i64>			%i = fptoui <2 x double> %x to <2 x i64>
	%r = uitofp <2 x i64> %i to <2 x double>			%r = uitofp <2 x i64> %i to <2 x double>
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) nounwind {			define <4 x double> @trunc_unsigned_v4f64(<4 x double> %x) #0 {
	; SSE2-LABEL: trunc_unsigned_v4f64:			; SSE2-LABEL: trunc_unsigned_v4f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movaps %xmm1, %xmm3			; SSE2-NEXT: movaps %xmm1, %xmm3
	; SSE2-NEXT: movhlps {{.*#+}} xmm3 = xmm1[1],xmm3[1]			; SSE2-NEXT: movhlps {{.*#+}} xmm3 = xmm1[1],xmm3[1]
	; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; SSE2-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; SSE2-NEXT: movaps %xmm3, %xmm4			; SSE2-NEXT: movaps %xmm3, %xmm4
	; SSE2-NEXT: subsd %xmm2, %xmm4			; SSE2-NEXT: subsd %xmm2, %xmm4
	; SSE2-NEXT: cvttsd2si %xmm4, %rcx			; SSE2-NEXT: cvttsd2si %xmm4, %rcx
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundpd $11, %ymm0, %ymm0			; AVX1-NEXT: vroundpd $11, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptoui <4 x double> %x to <4 x i64>			%i = fptoui <4 x double> %x to <4 x i64>
	%r = uitofp <4 x i64> %i to <4 x double>			%r = uitofp <4 x i64> %i to <4 x double>
	ret <4 x double> %r			ret <4 x double> %r
	}			}

	define float @trunc_signed_f32(float %x) nounwind {			define float @trunc_signed_f32(float %x) #0 {
	; SSE2-LABEL: trunc_signed_f32:			; SSE2-LABEL: trunc_signed_f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttss2si %xmm0, %eax			; SSE2-NEXT: cvttss2si %xmm0, %eax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2ssl %eax, %xmm0			; SSE2-NEXT: cvtsi2ssl %eax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_f32:			; SSE41-LABEL: trunc_signed_f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: roundss $11, %xmm0, %xmm0			; SSE41-NEXT: roundss $11, %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_f32:			; AVX1-LABEL: trunc_signed_f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vroundss $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi float %x to i32			%i = fptosi float %x to i32
	%r = sitofp i32 %i to float			%r = sitofp i32 %i to float
	ret float %r			ret float %r
	}			}

	define double @trunc_signed_f64(double %x) nounwind {			define double @trunc_signed_f64(double %x) #0 {
	; SSE2-LABEL: trunc_signed_f64:			; SSE2-LABEL: trunc_signed_f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm0, %rax			; SSE2-NEXT: cvttsd2si %xmm0, %rax
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0			; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_f64:			; SSE41-LABEL: trunc_signed_f64:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: roundsd $11, %xmm0, %xmm0			; SSE41-NEXT: roundsd $11, %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_f64:			; AVX1-LABEL: trunc_signed_f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vroundsd $11, %xmm0, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi double %x to i64			%i = fptosi double %x to i64
	%r = sitofp i64 %i to double			%r = sitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	define <4 x float> @trunc_signed_v4f32(<4 x float> %x) nounwind {			define <4 x float> @trunc_signed_v4f32(<4 x float> %x) #0 {
	; SSE2-LABEL: trunc_signed_v4f32:			; SSE2-LABEL: trunc_signed_v4f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttps2dq %xmm0, %xmm0			; SSE2-NEXT: cvttps2dq %xmm0, %xmm0
	; SSE2-NEXT: cvtdq2ps %xmm0, %xmm0			; SSE2-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc_signed_v4f32:			; SSE41-LABEL: trunc_signed_v4f32:
	; SSE41: # %bb.0:			; SSE41: # %bb.0:
	; SSE41-NEXT: roundps $11, %xmm0, %xmm0			; SSE41-NEXT: roundps $11, %xmm0, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc_signed_v4f32:			; AVX1-LABEL: trunc_signed_v4f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundps $11, %xmm0, %xmm0			; AVX1-NEXT: vroundps $11, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi <4 x float> %x to <4 x i32>			%i = fptosi <4 x float> %x to <4 x i32>
	%r = sitofp <4 x i32> %i to <4 x float>			%r = sitofp <4 x i32> %i to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @trunc_signed_v2f64(<2 x double> %x) nounwind {			define <2 x double> @trunc_signed_v2f64(<2 x double> %x) #0 {
	; SSE2-LABEL: trunc_signed_v2f64:			; SSE2-LABEL: trunc_signed_v2f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm0, %rax			; SSE2-NEXT: cvttsd2si %xmm0, %rax
	; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE2-NEXT: cvttsd2si %xmm0, %rcx			; SSE2-NEXT: cvttsd2si %xmm0, %rcx
	; SSE2-NEXT: xorps %xmm0, %xmm0			; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0			; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rcx, %xmm1			; SSE2-NEXT: cvtsi2sdq %rcx, %xmm1
	Show All 9 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0			; AVX1-NEXT: vroundpd $11, %xmm0, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi <2 x double> %x to <2 x i64>			%i = fptosi <2 x double> %x to <2 x i64>
	%r = sitofp <2 x i64> %i to <2 x double>			%r = sitofp <2 x i64> %i to <2 x double>
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x double> @trunc_signed_v4f64(<4 x double> %x) nounwind {			define <4 x double> @trunc_signed_v4f64(<4 x double> %x) #0 {
	; SSE2-LABEL: trunc_signed_v4f64:			; SSE2-LABEL: trunc_signed_v4f64:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: cvttsd2si %xmm1, %rax			; SSE2-NEXT: cvttsd2si %xmm1, %rax
	; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]
	; SSE2-NEXT: cvttsd2si %xmm1, %rcx			; SSE2-NEXT: cvttsd2si %xmm1, %rcx
	; SSE2-NEXT: cvttsd2si %xmm0, %rdx			; SSE2-NEXT: cvttsd2si %xmm0, %rdx
	; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE2-NEXT: cvttsd2si %xmm0, %rsi			; SSE2-NEXT: cvttsd2si %xmm0, %rsi
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcvttsd2si %xmm0, %rax			; AVX1-NEXT: vcvttsd2si %xmm0, %rax
	; AVX1-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0			; AVX1-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	%i = fptosi double %x to i64			%i = fptosi double %x to i64
	%r = sitofp i64 %i to double			%r = sitofp i64 %i to double
	ret double %r			ret double %r
	}			}

	attributes #1 = { nounwind "strict-float-cast-overflow"="false" }			attributes #0 = { nounwind "no-signed-zeros-fp-math"="true" }
				attributes #1 = { nounwind "no-signed-zeros-fp-math"="true" "strict-float-cast-overflow"="false" }

test/CodeGen/X86/sse-cvttp2si.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mattr=sse4.1 \| FileCheck %s --check-prefixes=SSE			; RUN: llc < %s -mtriple=x86_64-- -mattr=sse4.1 \| FileCheck %s --check-prefixes=SSE
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX,AVX1
	; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl \| FileCheck %s --check-prefixes=AVX,AVX512			; RUN: llc < %s -mtriple=x86_64-- -mattr=avx512f,avx512vl \| FileCheck %s --check-prefixes=AVX,AVX512

	; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751			; PR37751 - https://bugs.llvm.org/show_bug.cgi?id=37751
	; We can't combine into 'round' instructions because the behavior is different for out-of-range values.			; We can't combine into 'round' instructions because the behavior is different for out-of-range values.

	declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>)			declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>)
	declare <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double>)			declare <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double>)
	declare i32 @llvm.x86.sse.cvttss2si(<4 x float>)			declare i32 @llvm.x86.sse.cvttss2si(<4 x float>)
	declare i64 @llvm.x86.sse.cvttss2si64(<4 x float>)			declare i64 @llvm.x86.sse.cvttss2si64(<4 x float>)
	declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>)			declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>)
	declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>)			declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>)

	define float @float_to_int_to_float_mem_f32_i32(<4 x float>* %p) {			define float @float_to_int_to_float_mem_f32_i32(<4 x float>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_f32_i32:			; SSE-LABEL: float_to_int_to_float_mem_f32_i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttss2si (%rdi), %eax			; SSE-NEXT: cvttss2si (%rdi), %eax
	; SSE-NEXT: cvtsi2ssl %eax, %xmm0			; SSE-NEXT: cvtsi2ssl %eax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_f32_i32:			; AVX-LABEL: float_to_int_to_float_mem_f32_i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttss2si (%rdi), %eax			; AVX-NEXT: vcvttss2si (%rdi), %eax
	; AVX-NEXT: vcvtsi2ssl %eax, %xmm0, %xmm0			; AVX-NEXT: vcvtsi2ssl %eax, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <4 x float>, <4 x float>* %p, align 16			%x = load <4 x float>, <4 x float>* %p, align 16
	%fptosi = tail call i32 @llvm.x86.sse.cvttss2si(<4 x float> %x)			%fptosi = tail call i32 @llvm.x86.sse.cvttss2si(<4 x float> %x)
	%sitofp = sitofp i32 %fptosi to float			%sitofp = sitofp i32 %fptosi to float
	ret float %sitofp			ret float %sitofp
	}			}

	define float @float_to_int_to_float_reg_f32_i32(<4 x float> %x) {			define float @float_to_int_to_float_reg_f32_i32(<4 x float> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_f32_i32:			; SSE-LABEL: float_to_int_to_float_reg_f32_i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttss2si %xmm0, %eax			; SSE-NEXT: cvttss2si %xmm0, %eax
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtsi2ssl %eax, %xmm0			; SSE-NEXT: cvtsi2ssl %eax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_f32_i32:			; AVX-LABEL: float_to_int_to_float_reg_f32_i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttss2si %xmm0, %eax			; AVX-NEXT: vcvttss2si %xmm0, %eax
	; AVX-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0			; AVX-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call i32 @llvm.x86.sse.cvttss2si(<4 x float> %x)			%fptosi = tail call i32 @llvm.x86.sse.cvttss2si(<4 x float> %x)
	%sitofp = sitofp i32 %fptosi to float			%sitofp = sitofp i32 %fptosi to float
	ret float %sitofp			ret float %sitofp
	}			}

	define float @float_to_int_to_float_mem_f32_i64(<4 x float>* %p) {			define float @float_to_int_to_float_mem_f32_i64(<4 x float>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_f32_i64:			; SSE-LABEL: float_to_int_to_float_mem_f32_i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttss2si (%rdi), %rax			; SSE-NEXT: cvttss2si (%rdi), %rax
	; SSE-NEXT: cvtsi2ssq %rax, %xmm0			; SSE-NEXT: cvtsi2ssq %rax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_f32_i64:			; AVX-LABEL: float_to_int_to_float_mem_f32_i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttss2si (%rdi), %rax			; AVX-NEXT: vcvttss2si (%rdi), %rax
	; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0			; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <4 x float>, <4 x float>* %p, align 16			%x = load <4 x float>, <4 x float>* %p, align 16
	%fptosi = tail call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %x)			%fptosi = tail call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %x)
	%sitofp = sitofp i64 %fptosi to float			%sitofp = sitofp i64 %fptosi to float
	ret float %sitofp			ret float %sitofp
	}			}

	define float @float_to_int_to_float_reg_f32_i64(<4 x float> %x) {			define float @float_to_int_to_float_reg_f32_i64(<4 x float> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_f32_i64:			; SSE-LABEL: float_to_int_to_float_reg_f32_i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttss2si %xmm0, %rax			; SSE-NEXT: cvttss2si %xmm0, %rax
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtsi2ssq %rax, %xmm0			; SSE-NEXT: cvtsi2ssq %rax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_f32_i64:			; AVX-LABEL: float_to_int_to_float_reg_f32_i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttss2si %xmm0, %rax			; AVX-NEXT: vcvttss2si %xmm0, %rax
	; AVX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0			; AVX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %x)			%fptosi = tail call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %x)
	%sitofp = sitofp i64 %fptosi to float			%sitofp = sitofp i64 %fptosi to float
	ret float %sitofp			ret float %sitofp
	}			}

	define double @float_to_int_to_float_mem_f64_i32(<2 x double>* %p) {			define double @float_to_int_to_float_mem_f64_i32(<2 x double>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_f64_i32:			; SSE-LABEL: float_to_int_to_float_mem_f64_i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttsd2si (%rdi), %eax			; SSE-NEXT: cvttsd2si (%rdi), %eax
	; SSE-NEXT: cvtsi2sdl %eax, %xmm0			; SSE-NEXT: cvtsi2sdl %eax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_f64_i32:			; AVX-LABEL: float_to_int_to_float_mem_f64_i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttsd2si (%rdi), %eax			; AVX-NEXT: vcvttsd2si (%rdi), %eax
	; AVX-NEXT: vcvtsi2sdl %eax, %xmm0, %xmm0			; AVX-NEXT: vcvtsi2sdl %eax, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <2 x double>, <2 x double>* %p, align 16			%x = load <2 x double>, <2 x double>* %p, align 16
	%fptosi = tail call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %x)			%fptosi = tail call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %x)
	%sitofp = sitofp i32 %fptosi to double			%sitofp = sitofp i32 %fptosi to double
	ret double %sitofp			ret double %sitofp
	}			}

	define double @float_to_int_to_float_reg_f64_i32(<2 x double> %x) {			define double @float_to_int_to_float_reg_f64_i32(<2 x double> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_f64_i32:			; SSE-LABEL: float_to_int_to_float_reg_f64_i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttsd2si %xmm0, %eax			; SSE-NEXT: cvttsd2si %xmm0, %eax
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtsi2sdl %eax, %xmm0			; SSE-NEXT: cvtsi2sdl %eax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_f64_i32:			; AVX-LABEL: float_to_int_to_float_reg_f64_i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttsd2si %xmm0, %eax			; AVX-NEXT: vcvttsd2si %xmm0, %eax
	; AVX-NEXT: vcvtsi2sdl %eax, %xmm1, %xmm0			; AVX-NEXT: vcvtsi2sdl %eax, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %x)			%fptosi = tail call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %x)
	%sitofp = sitofp i32 %fptosi to double			%sitofp = sitofp i32 %fptosi to double
	ret double %sitofp			ret double %sitofp
	}			}

	define double @float_to_int_to_float_mem_f64_i64(<2 x double>* %p) {			define double @float_to_int_to_float_mem_f64_i64(<2 x double>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_f64_i64:			; SSE-LABEL: float_to_int_to_float_mem_f64_i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttsd2si (%rdi), %rax			; SSE-NEXT: cvttsd2si (%rdi), %rax
	; SSE-NEXT: cvtsi2sdq %rax, %xmm0			; SSE-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_f64_i64:			; AVX-LABEL: float_to_int_to_float_mem_f64_i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttsd2si (%rdi), %rax			; AVX-NEXT: vcvttsd2si (%rdi), %rax
	; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0			; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <2 x double>, <2 x double>* %p, align 16			%x = load <2 x double>, <2 x double>* %p, align 16
	%fptosi = tail call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %x)			%fptosi = tail call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %x)
	%sitofp = sitofp i64 %fptosi to double			%sitofp = sitofp i64 %fptosi to double
	ret double %sitofp			ret double %sitofp
	}			}

	define double @float_to_int_to_float_reg_f64_i64(<2 x double> %x) {			define double @float_to_int_to_float_reg_f64_i64(<2 x double> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_f64_i64:			; SSE-LABEL: float_to_int_to_float_reg_f64_i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttsd2si %xmm0, %rax			; SSE-NEXT: cvttsd2si %xmm0, %rax
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtsi2sdq %rax, %xmm0			; SSE-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_f64_i64:			; AVX-LABEL: float_to_int_to_float_reg_f64_i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttsd2si %xmm0, %rax			; AVX-NEXT: vcvttsd2si %xmm0, %rax
	; AVX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0			; AVX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %x)			%fptosi = tail call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %x)
	%sitofp = sitofp i64 %fptosi to double			%sitofp = sitofp i64 %fptosi to double
	ret double %sitofp			ret double %sitofp
	}			}

	define <4 x float> @float_to_int_to_float_mem_v4f32(<4 x float>* %p) {			define <4 x float> @float_to_int_to_float_mem_v4f32(<4 x float>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_v4f32:			; SSE-LABEL: float_to_int_to_float_mem_v4f32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: roundps $11, (%rdi), %xmm0			; SSE-NEXT: roundps $11, (%rdi), %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_v4f32:			; AVX-LABEL: float_to_int_to_float_mem_v4f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundps $11, (%rdi), %xmm0			; AVX-NEXT: vroundps $11, (%rdi), %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <4 x float>, <4 x float>* %p, align 16			%x = load <4 x float>, <4 x float>* %p, align 16
	%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %x)			%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %x)
	%sitofp = sitofp <4 x i32> %fptosi to <4 x float>			%sitofp = sitofp <4 x i32> %fptosi to <4 x float>
	ret <4 x float> %sitofp			ret <4 x float> %sitofp
	}			}

	define <4 x float> @float_to_int_to_float_reg_v4f32(<4 x float> %x) {			define <4 x float> @float_to_int_to_float_reg_v4f32(<4 x float> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_v4f32:			; SSE-LABEL: float_to_int_to_float_reg_v4f32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: roundps $11, %xmm0, %xmm0			; SSE-NEXT: roundps $11, %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_v4f32:			; AVX-LABEL: float_to_int_to_float_reg_v4f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vroundps $11, %xmm0, %xmm0			; AVX-NEXT: vroundps $11, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %x)			%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %x)
	%sitofp = sitofp <4 x i32> %fptosi to <4 x float>			%sitofp = sitofp <4 x i32> %fptosi to <4 x float>
	ret <4 x float> %sitofp			ret <4 x float> %sitofp
	}			}

	define <2 x double> @float_to_int_to_float_mem_v2f64(<2 x double>* %p) {			define <2 x double> @float_to_int_to_float_mem_v2f64(<2 x double>* %p) #0 {
	; SSE-LABEL: float_to_int_to_float_mem_v2f64:			; SSE-LABEL: float_to_int_to_float_mem_v2f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttpd2dq (%rdi), %xmm0			; SSE-NEXT: cvttpd2dq (%rdi), %xmm0
	; SSE-NEXT: cvtdq2pd %xmm0, %xmm0			; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_mem_v2f64:			; AVX-LABEL: float_to_int_to_float_mem_v2f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttpd2dqx (%rdi), %xmm0			; AVX-NEXT: vcvttpd2dqx (%rdi), %xmm0
	; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0			; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load <2 x double>, <2 x double>* %p, align 16			%x = load <2 x double>, <2 x double>* %p, align 16
	%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %x)			%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %x)
	%concat = shufflevector <4 x i32> %fptosi, <4 x i32> undef, <2 x i32> <i32 0, i32 1>			%concat = shufflevector <4 x i32> %fptosi, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
	%sitofp = sitofp <2 x i32> %concat to <2 x double>			%sitofp = sitofp <2 x i32> %concat to <2 x double>
	ret <2 x double> %sitofp			ret <2 x double> %sitofp
	}			}

	define <2 x double> @float_to_int_to_float_reg_v2f64(<2 x double> %x) {			define <2 x double> @float_to_int_to_float_reg_v2f64(<2 x double> %x) #0 {
	; SSE-LABEL: float_to_int_to_float_reg_v2f64:			; SSE-LABEL: float_to_int_to_float_reg_v2f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttpd2dq %xmm0, %xmm0			; SSE-NEXT: cvttpd2dq %xmm0, %xmm0
	; SSE-NEXT: cvtdq2pd %xmm0, %xmm0			; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: float_to_int_to_float_reg_v2f64:			; AVX-LABEL: float_to_int_to_float_reg_v2f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttpd2dq %xmm0, %xmm0			; AVX-NEXT: vcvttpd2dq %xmm0, %xmm0
	; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0			; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %x)			%fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double> %x)
	%concat = shufflevector <4 x i32> %fptosi, <4 x i32> undef, <2 x i32> <i32 0, i32 1>			%concat = shufflevector <4 x i32> %fptosi, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
	%sitofp = sitofp <2 x i32> %concat to <2 x double>			%sitofp = sitofp <2 x i32> %concat to <2 x double>
	ret <2 x double> %sitofp			ret <2 x double> %sitofp
	}			}

				attributes #0 = { "no-signed-zeros-fp-math"="true" }

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zerosClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150979

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/ftrunc.ll

test/CodeGen/PowerPC/fp-int128-fp-combine.ll

test/CodeGen/PowerPC/fp-to-int-to-fp.ll

test/CodeGen/PowerPC/ftrunc-vec.ll

test/CodeGen/PowerPC/no-extra-fp-conv-ldst.ll

test/CodeGen/X86/2011-10-19-widen_vselect.ll

test/CodeGen/X86/avx-cvttp2si.ll

test/CodeGen/X86/ftrunc.ll

test/CodeGen/X86/sse-cvttp2si.ll

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
ClosedPublic