This is an archive of the discontinued LLVM Phabricator instance.

[SDag] SimplifyDemandedBits: simplify to FP constant if all bits known
ClosedPublic

Authored by foad on Sep 30 2020, 7:13 AM.

Download Raw Diff

Details

Reviewers

RKSimon
nikic
craig.topper
spatel
t.p.northover
dmgreen
efriedma
MeeraN
SjoerdMeijer

Commits

rG1aa8e6a51a0e: [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known

Summary

We were already doing this for integer constants. This patch implements
the same thing for floating point constants.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Sep 30 2020, 7:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 7:13 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

foad requested review of this revision.Sep 30 2020, 7:13 AM

foad added a parent revision: D88569: [DAGCombiner] Call SimplifyDemandedBits to simplify EXTRACT_VECTOR_ELT.Sep 30 2020, 7:13 AM

foad mentioned this in D87502: [DAGCombiner] Use known bits to fold extract_vector_elt with const index.

Harbormaster completed remote builds in B73506: Diff 295271.Sep 30 2020, 7:26 AM

craig.topper added inline comments.Sep 30 2020, 10:00 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
2263–2264	Would fixing this FIXME do the same thing?

foad added inline comments.Sep 30 2020, 11:12 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
2263–2264	Aha! I thought this should be done centrally somewhere. I'll give this a try, thanks.

Reimplement as suggested.

Harbormaster completed remote builds in B73665: Diff 295591.Oct 1 2020, 9:04 AM

foad retitled this revision from [TargetLowering] SimplifyDemandedBits EXTRACT_VECTOR_ELT -> constant to [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known.Oct 1 2020, 9:05 AM

foad edited the summary of this revision. (Show Details)

foad removed a parent revision: D88569: [DAGCombiner] Call SimplifyDemandedBits to simplify EXTRACT_VECTOR_ELT.

foad mentioned this in D88569: [DAGCombiner] Call SimplifyDemandedBits to simplify EXTRACT_VECTOR_ELT.Oct 1 2020, 9:15 AM

foad added a child revision: D88569: [DAGCombiner] Call SimplifyDemandedBits to simplify EXTRACT_VECTOR_ELT.Oct 1 2020, 9:15 AM

This looks like a clear win to me on all the affected test cases except for one slight regression noted inline.

llvm/test/CodeGen/ARM/fcopysign.ll
98–101	I'm not an ARM expert but this slight regression looks like it's just bad luck in the register allocator. If the vmov went into d1 then we could use vbit instead of vorr+vbsl.

Herald added a subscriber: pengfei. · View Herald TranscriptOct 5 2020, 3:07 AM

(adding more potential ARM reviewers)

RKSimon added inline comments.Oct 5 2020, 11:33 AM

llvm/test/CodeGen/X86/uint_to_fp-2.ll
11	@craig.topper @spatel This looks like we're loading the same constant as (double x) and <double x, double ???> - do you know if we already have examples of this or is this a new type of regression?

dmgreen added inline comments.Oct 5 2020, 11:56 PM

llvm/test/CodeGen/ARM/fcopysign.ll
98–101	Yeah this can happen... It's a bit of a shame but doesn't look like the fault of this patch, exactly. BSL/BIT/BIF are selected best-effort using a pseudo.

spatel added inline comments.Oct 6 2020, 8:35 AM

llvm/test/CodeGen/X86/uint_to_fp-2.ll

We might avoid that by checking number of uses of the constant?

Looks like a general shortcoming of combining/hoisting constants:

$ cat vec_const.ll 
define <4 x i32> @f(<4 x i32> %x) {
  %a1 = add <4 x i32> %x, <i32 1, i32 2, i32 3, i32 4>
  %a2 = xor <4 x i32> %a1, <i32 1, i32 2, i32 3, i32 undef>
  ret <4 x i32> %a2
}
$ llc -o - vec_const.ll 
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
	.section	__TEXT,__literal16,16byte_literals
	.p2align	4                               ## -- Begin function f
LCPI0_0:
	.long	1                               ## 0x1
	.long	2                               ## 0x2
	.long	3                               ## 0x3
	.long	4                               ## 0x4
LCPI0_1:
	.long	1                               ## 0x1
	.long	2                               ## 0x2
	.long	3                               ## 0x3
	.space	4
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_f
	.p2align	4, 0x90
_f:                                     ## @f
	.cfi_startproc
## %bb.0:
	paddd	LCPI0_0(%rip), %xmm0
	pxor	LCPI0_1(%rip), %xmm0
	retq

RKSimon added inline comments.Oct 6 2020, 11:43 AM

llvm/test/CodeGen/X86/uint_to_fp-2.ll
11	Thanks, I've raised a bug at https://bugs.llvm.org/show_bug.cgi?id=47744

LGTM, cheers - looks like the minor arm/x86 issues are already common and nothing really attributable to this patch.

This revision is now accepted and ready to land.Oct 7 2020, 1:03 AM

This revision was landed with ongoing or failed builds.Oct 7 2020, 1:31 AM

Closed by commit rG1aa8e6a51a0e: [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG1aa8e6a51a0e: [SDag] SimplifyDemandedBits: simplify to FP constant if all bits known.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

13 lines

test/

CodeGen/

ARM/

fcopysign.ll

5 lines

X86/

combine-bextr.ll

10 lines

copysign-constant-magnitude.ll

12 lines

fp-intrinsics.ll

19 lines

fp-round.ll

14 lines

fp-strict-scalar-inttofp.ll

33 lines

fp128-cast.ll

3 lines

scalar-int-to-fp.ll

19 lines

uint_to_fp-2.ll

15 lines

vector-shuffle-combining.ll

46 lines

Diff 296617

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 911 Lines • ▼ Show 20 Lines	bool TargetLowering::SimplifyDemandedBits(

if (Op.getOpcode() == ISD::Constant) {		if (Op.getOpcode() == ISD::Constant) {
// We know all of the bits for a constant!		// We know all of the bits for a constant!
Known.One = cast<ConstantSDNode>(Op)->getAPIntValue();		Known.One = cast<ConstantSDNode>(Op)->getAPIntValue();
Known.Zero = ~Known.One;		Known.Zero = ~Known.One;
return false;		return false;
}		}

		if (Op.getOpcode() == ISD::ConstantFP) {
		// We know all of the bits for a floating point constant!
		Known.One = cast<ConstantFPSDNode>(Op)->getValueAPF().bitcastToAPInt();
		Known.Zero = ~Known.One;
		return false;
		}

// Other users may use these bits.		// Other users may use these bits.
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (!Op.getNode()->hasOneUse() && !AssumeSingleUse) {		if (!Op.getNode()->hasOneUse() && !AssumeSingleUse) {
if (Depth != 0) {		if (Depth != 0) {
// If not at the root, Just compute the Known bits to		// If not at the root, Just compute the Known bits to
// simplify things downstream.		// simplify things downstream.
Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);		Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
return false;		return false;
▲ Show 20 Lines • Show All 1,320 Lines • ▼ Show 20 Lines	if (DemandedBits.isSubsetOf(Known.Zero \| Known.One)) {
const SDNode *N = Op.getNode();		const SDNode *N = Op.getNode();
for (SDNodeIterator I = SDNodeIterator::begin(N),		for (SDNodeIterator I = SDNodeIterator::begin(N),
E = SDNodeIterator::end(N);		E = SDNodeIterator::end(N);
I != E; ++I) {		I != E; ++I) {
SDNode Op = I;		SDNode Op = I;
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op))
if (C->isOpaque())		if (C->isOpaque())
return false;		return false;
}		}
// TODO: Handle float bits as well.
if (VT.isInteger())		if (VT.isInteger())
		craig.topperUnsubmitted Not Done Reply Inline Actions Would fixing this FIXME do the same thing? craig.topper: Would fixing this FIXME do the same thing?
		foadAuthorUnsubmitted Done Reply Inline Actions Aha! I thought this should be done centrally somewhere. I'll give this a try, thanks. foad: Aha! I thought this should be done centrally somewhere. I'll give this a try, thanks.
return TLO.CombineTo(Op, TLO.DAG.getConstant(Known.One, dl, VT));		return TLO.CombineTo(Op, TLO.DAG.getConstant(Known.One, dl, VT));
		if (VT.isFloatingPoint())
		return TLO.CombineTo(
		Op,
		TLO.DAG.getConstantFP(
		APFloat(TLO.DAG.EVTToAPFloatSemantics(VT), Known.One), dl, VT));
}		}

return false;		return false;
}		}

bool TargetLowering::SimplifyDemandedVectorElts(SDValue Op,		bool TargetLowering::SimplifyDemandedVectorElts(SDValue Op,
const APInt &DemandedElts,		const APInt &DemandedElts,
APInt &KnownUndef,		APInt &KnownUndef,
▲ Show 20 Lines • Show All 5,758 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fcopysign.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; HARD: @ %bb.0: @ %entry			; HARD: @ %bb.0: @ %entry
	; HARD-NEXT: .save {r11, lr}			; HARD-NEXT: .save {r11, lr}
	; HARD-NEXT: push {r11, lr}			; HARD-NEXT: push {r11, lr}
	; HARD-NEXT: bl bar			; HARD-NEXT: bl bar
	; HARD-NEXT: vmov d16, r0, r1			; HARD-NEXT: vmov d16, r0, r1
	; HARD-NEXT: vcvt.f32.f64 s0, d16			; HARD-NEXT: vcvt.f32.f64 s0, d16
	; HARD-NEXT: vmov.i32 d17, #0x80000000			; HARD-NEXT: vmov.i32 d17, #0x80000000
	; HARD-NEXT: vshr.u64 d16, d16, #32			; HARD-NEXT: vshr.u64 d16, d16, #32
	; HARD-NEXT: vmov.f32 s2, #5.000000e-01			; HARD-NEXT: vmov.i32 d18, #0x3f000000
	; HARD-NEXT: vbit d1, d16, d17			; HARD-NEXT: vorr d1, d17, d17
				; HARD-NEXT: vbsl d1, d16, d18
	; HARD-NEXT: vadd.f32 s0, s0, s2			; HARD-NEXT: vadd.f32 s0, s0, s2
				foadAuthorUnsubmitted Done Reply Inline Actions I'm not an ARM expert but this slight regression looks like it's just bad luck in the register allocator. If the vmov went into d1 then we could use vbit instead of vorr+vbsl. foad: I'm not an ARM expert but this slight regression looks like it's just bad luck in the register…
				dmgreenUnsubmitted Not Done Reply Inline Actions Yeah this can happen... It's a bit of a shame but doesn't look like the fault of this patch, exactly. BSL/BIT/BIF are selected best-effort using a pseudo. dmgreen: Yeah this can happen... It's a bit of a shame but doesn't look like the fault of this patch…
	; HARD-NEXT: pop {r11, pc}			; HARD-NEXT: pop {r11, pc}
	entry:			entry:
	%0 = tail call double (...) @bar() nounwind			%0 = tail call double (...) @bar() nounwind
	%1 = fptrunc double %0 to float			%1 = fptrunc double %0 to float
	%2 = tail call float @copysignf(float 5.000000e-01, float %1) nounwind readnone			%2 = tail call float @copysignf(float 5.000000e-01, float %1) nounwind readnone
	%3 = fadd float %1, %2			%3 = fadd float %1, %2
	ret float %3			ret float %3
	}			}

	declare double @bar(...)			declare double @bar(...)
	declare double @copysign(double, double) nounwind			declare double @copysign(double, double) nounwind
	declare float @copysignf(float, float) nounwind			declare float @copysignf(float, float) nounwind

llvm/test/CodeGen/X86/combine-bextr.ll

	Show All 33 Lines

	define float @bextr_uitofp(i32 %x, i32 %y) {			define float @bextr_uitofp(i32 %x, i32 %y) {
	; X32-LABEL: bextr_uitofp:			; X32-LABEL: bextr_uitofp:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: .cfi_def_cfa_offset 8			; X32-NEXT: .cfi_def_cfa_offset 8
	; X32-NEXT: movl $3855, %eax # imm = 0xF0F			; X32-NEXT: movl $3855, %eax # imm = 0xF0F
	; X32-NEXT: bextrl %eax, {{[0-9]+}}(%esp), %eax			; X32-NEXT: bextrl %eax, {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movd %eax, %xmm0
	; X32-NEXT: movd %eax, %xmm1			; X32-NEXT: por {{\.LCPI.*}}, %xmm0
	; X32-NEXT: por %xmm0, %xmm1			; X32-NEXT: subsd {{\.LCPI.*}}, %xmm0
	; X32-NEXT: subsd %xmm0, %xmm1			; X32-NEXT: cvtsd2ss %xmm0, %xmm0
	; X32-NEXT: xorps %xmm0, %xmm0
	; X32-NEXT: cvtsd2ss %xmm1, %xmm0
	; X32-NEXT: movss %xmm0, (%esp)			; X32-NEXT: movss %xmm0, (%esp)
	; X32-NEXT: flds (%esp)			; X32-NEXT: flds (%esp)
	; X32-NEXT: popl %eax			; X32-NEXT: popl %eax
	; X32-NEXT: .cfi_def_cfa_offset 4			; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: bextr_uitofp:			; X64-LABEL: bextr_uitofp:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl $3855, %eax # imm = 0xF0F			; X64-NEXT: movl $3855, %eax # imm = 0xF0F
	; X64-NEXT: bextrl %eax, %edi, %eax			; X64-NEXT: bextrl %eax, %edi, %eax
	; X64-NEXT: cvtsi2ss %eax, %xmm0			; X64-NEXT: cvtsi2ss %eax, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%1 = tail call i32 @llvm.x86.bmi.bextr.32(i32 %x, i32 3855)			%1 = tail call i32 @llvm.x86.bmi.bextr.32(i32 %x, i32 3855)
	%2 = uitofp i32 %1 to float			%2 = uitofp i32 %1 to float
	ret float %2			ret float %2
	}			}

llvm/test/CodeGen/X86/copysign-constant-magnitude.ll

	Show All 19 Lines
	}			}

	; CHECK: [[SIGNMASK2:L.+]]:			; CHECK: [[SIGNMASK2:L.+]]:
	; CHECK-NEXT: .quad 0x8000000000000000 ## double -0			; CHECK-NEXT: .quad 0x8000000000000000 ## double -0

	define double @mag_neg0_double(double %x) nounwind {			define double @mag_neg0_double(double %x) nounwind {
	; CHECK-LABEL: mag_neg0_double:			; CHECK-LABEL: mag_neg0_double:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: andps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = call double @copysign(double -0.0, double %x)			%y = call double @copysign(double -0.0, double %x)
	ret double %y			ret double %y
	}			}

	; CHECK: [[SIGNMASK3:L.+]]:			; CHECK: [[SIGNMASK3:L.+]]:
	; CHECK-NEXT: .quad 0x8000000000000000 ## double -0			; CHECK-NEXT: .quad 0x8000000000000000 ## double -0
	; CHECK-NEXT: .quad 0x8000000000000000 ## double -0			; CHECK-NEXT: .quad 0x8000000000000000 ## double -0
	; CHECK: [[ONE3:L.+]]:			; CHECK: [[ONE3:L.+]]:
	; CHECK-NEXT: .quad 0x3ff0000000000000 ## double 1			; CHECK-NEXT: .quad 0x3ff0000000000000 ## double 1

	define double @mag_pos1_double(double %x) nounwind {			define double @mag_pos1_double(double %x) nounwind {
	; CHECK-LABEL: mag_pos1_double:			; CHECK-LABEL: mag_pos1_double:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: andps {{.*}}(%rip), %xmm0			; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: orps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: orps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = call double @copysign(double 1.0, double %x)			%y = call double @copysign(double 1.0, double %x)
	ret double %y			ret double %y
	}			}

	; CHECK: [[SIGNMASK4:L.+]]:			; CHECK: [[SIGNMASK4:L.+]]:
	; CHECK-NEXT: .quad 0x8000000000000000 ## double -0			; CHECK-NEXT: .quad 0x8000000000000000 ## double -0
	; CHECK-NEXT: .quad 0x8000000000000000 ## double -0			; CHECK-NEXT: .quad 0x8000000000000000 ## double -0
	Show All 27 Lines
	}			}

	; CHECK: [[SIGNMASK6:L.+]]:			; CHECK: [[SIGNMASK6:L.+]]:
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0

	define float @mag_neg0_float(float %x) nounwind {			define float @mag_neg0_float(float %x) nounwind {
	; CHECK-LABEL: mag_neg0_float:			; CHECK-LABEL: mag_neg0_float:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: andps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = call float @copysignf(float -0.0, float %x)			%y = call float @copysignf(float -0.0, float %x)
	ret float %y			ret float %y
	}			}

	; CHECK: [[SIGNMASK7:L.+]]:			; CHECK: [[SIGNMASK7:L.+]]:
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	; CHECK: [[ONE7:L.+]]:			; CHECK: [[ONE7:L.+]]:
	; CHECK-NEXT: .long 0x3f800000 ## float 1			; CHECK-NEXT: .long 0x3f800000 ## float 1

	define float @mag_pos1_float(float %x) nounwind {			define float @mag_pos1_float(float %x) nounwind {
	; CHECK-LABEL: mag_pos1_float:			; CHECK-LABEL: mag_pos1_float:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: andps {{.*}}(%rip), %xmm0			; CHECK-NEXT: andps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: orps {{.*}}(%rip), %xmm0
	; CHECK-NEXT: orps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%y = call float @copysignf(float 1.0, float %x)			%y = call float @copysignf(float 1.0, float %x)
	ret float %y			ret float %y
	}			}

	; CHECK: [[SIGNMASK8:L.+]]:			; CHECK: [[SIGNMASK8:L.+]]:
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	; CHECK-NEXT: .long 0x80000000 ## float -0			; CHECK-NEXT: .long 0x80000000 ## float -0
	Show All 21 Lines

llvm/test/CodeGen/X86/fp-intrinsics.ll

	Show First 20 Lines • Show All 2,432 Lines • ▼ Show 20 Lines
	; X87-NEXT: addl $12, %esp			; X87-NEXT: addl $12, %esp
	; X87-NEXT: .cfi_def_cfa_offset 4			; X87-NEXT: .cfi_def_cfa_offset 4
	; X87-NEXT: retl			; X87-NEXT: retl
	;			;
	; X86-SSE-LABEL: uifdi:			; X86-SSE-LABEL: uifdi:
	; X86-SSE: # %bb.0: # %entry			; X86-SSE: # %bb.0: # %entry
	; X86-SSE-NEXT: subl $12, %esp			; X86-SSE-NEXT: subl $12, %esp
	; X86-SSE-NEXT: .cfi_def_cfa_offset 16			; X86-SSE-NEXT: .cfi_def_cfa_offset 16
	; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X86-SSE-NEXT: orpd {{\.LCPI.*}}, %xmm0
	; X86-SSE-NEXT: orpd %xmm0, %xmm1			; X86-SSE-NEXT: subsd {{\.LCPI.*}}, %xmm0
	; X86-SSE-NEXT: subsd %xmm0, %xmm1			; X86-SSE-NEXT: movsd %xmm0, (%esp)
	; X86-SSE-NEXT: movsd %xmm1, (%esp)
	; X86-SSE-NEXT: fldl (%esp)			; X86-SSE-NEXT: fldl (%esp)
	; X86-SSE-NEXT: wait			; X86-SSE-NEXT: wait
	; X86-SSE-NEXT: addl $12, %esp			; X86-SSE-NEXT: addl $12, %esp
	; X86-SSE-NEXT: .cfi_def_cfa_offset 4			; X86-SSE-NEXT: .cfi_def_cfa_offset 4
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; SSE-LABEL: uifdi:			; SSE-LABEL: uifdi:
	; SSE: # %bb.0: # %entry			; SSE: # %bb.0: # %entry
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; X87-NEXT: addl $12, %esp			; X87-NEXT: addl $12, %esp
	; X87-NEXT: .cfi_def_cfa_offset 4			; X87-NEXT: .cfi_def_cfa_offset 4
	; X87-NEXT: retl			; X87-NEXT: retl
	;			;
	; X86-SSE-LABEL: uiffi:			; X86-SSE-LABEL: uiffi:
	; X86-SSE: # %bb.0: # %entry			; X86-SSE: # %bb.0: # %entry
	; X86-SSE-NEXT: pushl %eax			; X86-SSE-NEXT: pushl %eax
	; X86-SSE-NEXT: .cfi_def_cfa_offset 8			; X86-SSE-NEXT: .cfi_def_cfa_offset 8
	; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X86-SSE-NEXT: orpd {{\.LCPI.*}}, %xmm0
	; X86-SSE-NEXT: orpd %xmm0, %xmm1			; X86-SSE-NEXT: subsd {{\.LCPI.*}}, %xmm0
	; X86-SSE-NEXT: subsd %xmm0, %xmm1			; X86-SSE-NEXT: cvtsd2ss %xmm0, %xmm0
	; X86-SSE-NEXT: xorps %xmm0, %xmm0
	; X86-SSE-NEXT: cvtsd2ss %xmm1, %xmm0
	; X86-SSE-NEXT: movss %xmm0, (%esp)			; X86-SSE-NEXT: movss %xmm0, (%esp)
	; X86-SSE-NEXT: flds (%esp)			; X86-SSE-NEXT: flds (%esp)
	; X86-SSE-NEXT: wait			; X86-SSE-NEXT: wait
	; X86-SSE-NEXT: popl %eax			; X86-SSE-NEXT: popl %eax
	; X86-SSE-NEXT: .cfi_def_cfa_offset 4			; X86-SSE-NEXT: .cfi_def_cfa_offset 4
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; SSE-LABEL: uiffi:			; SSE-LABEL: uiffi:
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-round.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse2 \| FileCheck %s --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.1 \| FileCheck %s --check-prefixes=SSE41			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+sse4.1 \| FileCheck %s --check-prefixes=SSE41
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX,AVX512			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX,AVX512

	define float @round_f32(float %x) {			define float @round_f32(float %x) {
	; SSE2-LABEL: round_f32:			; SSE2-LABEL: round_f32:
	; SSE2: ## %bb.0:			; SSE2: ## %bb.0:
	; SSE2-NEXT: jmp _roundf ## TAILCALL			; SSE2-NEXT: jmp _roundf ## TAILCALL
	;			;
	; SSE41-LABEL: round_f32:			; SSE41-LABEL: round_f32:
	; SSE41: ## %bb.0:			; SSE41: ## %bb.0:
	; SSE41-NEXT: movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; SSE41-NEXT: movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; SSE41-NEXT: andps %xmm0, %xmm1			; SSE41-NEXT: andps %xmm0, %xmm1
	; SSE41-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SSE41-NEXT: orps {{.*}}(%rip), %xmm1
	; SSE41-NEXT: orps %xmm1, %xmm2			; SSE41-NEXT: addss %xmm0, %xmm1
	; SSE41-NEXT: addss %xmm0, %xmm2
	; SSE41-NEXT: xorps %xmm0, %xmm0			; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: roundss $11, %xmm2, %xmm0			; SSE41-NEXT: roundss $11, %xmm1, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: round_f32:			; AVX1-LABEL: round_f32:
	; AVX1: ## %bb.0:			; AVX1: ## %bb.0:
	; AVX1-NEXT: vandps {{.*}}(%rip), %xmm0, %xmm1			; AVX1-NEXT: vandps {{.*}}(%rip), %xmm0, %xmm1
	; AVX1-NEXT: vbroadcastss {{.*#+}} xmm2 = [4.9999997E-1,4.9999997E-1,4.9999997E-1,4.9999997E-1]			; AVX1-NEXT: vbroadcastss {{.*#+}} xmm2 = [4.9999997E-1,4.9999997E-1,4.9999997E-1,4.9999997E-1]
	; AVX1-NEXT: vorps %xmm1, %xmm2, %xmm1			; AVX1-NEXT: vorps %xmm1, %xmm2, %xmm1
	; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm0
	Show All 17 Lines
	; SSE2-LABEL: round_f64:			; SSE2-LABEL: round_f64:
	; SSE2: ## %bb.0:			; SSE2: ## %bb.0:
	; SSE2-NEXT: jmp _round ## TAILCALL			; SSE2-NEXT: jmp _round ## TAILCALL
	;			;
	; SSE41-LABEL: round_f64:			; SSE41-LABEL: round_f64:
	; SSE41: ## %bb.0:			; SSE41: ## %bb.0:
	; SSE41-NEXT: movapd {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0]			; SSE41-NEXT: movapd {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0]
	; SSE41-NEXT: andpd %xmm0, %xmm1			; SSE41-NEXT: andpd %xmm0, %xmm1
	; SSE41-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; SSE41-NEXT: orpd {{.*}}(%rip), %xmm1
	; SSE41-NEXT: orpd %xmm1, %xmm2			; SSE41-NEXT: addsd %xmm0, %xmm1
	; SSE41-NEXT: addsd %xmm0, %xmm2
	; SSE41-NEXT: xorps %xmm0, %xmm0			; SSE41-NEXT: xorps %xmm0, %xmm0
	; SSE41-NEXT: roundsd $11, %xmm2, %xmm0			; SSE41-NEXT: roundsd $11, %xmm1, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: round_f64:			; AVX-LABEL: round_f64:
	; AVX: ## %bb.0:			; AVX: ## %bb.0:
	; AVX-NEXT: vandpd {{.*}}(%rip), %xmm0, %xmm1			; AVX-NEXT: vandpd {{.*}}(%rip), %xmm0, %xmm1
	; AVX-NEXT: vmovddup {{.*#+}} xmm2 = [4.9999999999999994E-1,4.9999999999999994E-1]			; AVX-NEXT: vmovddup {{.*#+}} xmm2 = [4.9999999999999994E-1,4.9999999999999994E-1]
	; AVX-NEXT: ## xmm2 = mem[0,0]			; AVX-NEXT: ## xmm2 = mem[0,0]
	; AVX-NEXT: vorpd %xmm1, %xmm2, %xmm1			; AVX-NEXT: vorpd %xmm1, %xmm2, %xmm1
	▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll

Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines	; X87-NEXT: retl
ret float %result		ret float %result
}		}

define float @uitofp_i32tof32(i32 %x) #0 {		define float @uitofp_i32tof32(i32 %x) #0 {
; SSE-X86-LABEL: uitofp_i32tof32:		; SSE-X86-LABEL: uitofp_i32tof32:
; SSE-X86: # %bb.0:		; SSE-X86: # %bb.0:
; SSE-X86-NEXT: pushl %eax		; SSE-X86-NEXT: pushl %eax
; SSE-X86-NEXT: .cfi_def_cfa_offset 8		; SSE-X86-NEXT: .cfi_def_cfa_offset 8
; SSE-X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; SSE-X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-X86-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSE-X86-NEXT: orpd {{\.LCPI.*}}, %xmm0
; SSE-X86-NEXT: orpd %xmm0, %xmm1		; SSE-X86-NEXT: subsd {{\.LCPI.*}}, %xmm0
; SSE-X86-NEXT: subsd %xmm0, %xmm1		; SSE-X86-NEXT: cvtsd2ss %xmm0, %xmm0
; SSE-X86-NEXT: xorps %xmm0, %xmm0
; SSE-X86-NEXT: cvtsd2ss %xmm1, %xmm0
; SSE-X86-NEXT: movss %xmm0, (%esp)		; SSE-X86-NEXT: movss %xmm0, (%esp)
; SSE-X86-NEXT: flds (%esp)		; SSE-X86-NEXT: flds (%esp)
; SSE-X86-NEXT: wait		; SSE-X86-NEXT: wait
; SSE-X86-NEXT: popl %eax		; SSE-X86-NEXT: popl %eax
; SSE-X86-NEXT: .cfi_def_cfa_offset 4		; SSE-X86-NEXT: .cfi_def_cfa_offset 4
; SSE-X86-NEXT: retl		; SSE-X86-NEXT: retl
;		;
; SSE-X64-LABEL: uitofp_i32tof32:		; SSE-X64-LABEL: uitofp_i32tof32:
; SSE-X64: # %bb.0:		; SSE-X64: # %bb.0:
; SSE-X64-NEXT: movl %edi, %eax		; SSE-X64-NEXT: movl %edi, %eax
; SSE-X64-NEXT: cvtsi2ss %rax, %xmm0		; SSE-X64-NEXT: cvtsi2ss %rax, %xmm0
; SSE-X64-NEXT: retq		; SSE-X64-NEXT: retq
;		;
; AVX1-X86-LABEL: uitofp_i32tof32:		; AVX1-X86-LABEL: uitofp_i32tof32:
; AVX1-X86: # %bb.0:		; AVX1-X86: # %bb.0:
; AVX1-X86-NEXT: pushl %eax		; AVX1-X86-NEXT: pushl %eax
; AVX1-X86-NEXT: .cfi_def_cfa_offset 8		; AVX1-X86-NEXT: .cfi_def_cfa_offset 8
; AVX1-X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; AVX1-X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX1-X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX1-X86-NEXT: vorpd {{\.LCPI.*}}, %xmm0, %xmm0
; AVX1-X86-NEXT: vorpd %xmm0, %xmm1, %xmm1		; AVX1-X86-NEXT: vsubsd {{\.LCPI.*}}, %xmm0, %xmm0
; AVX1-X86-NEXT: vsubsd %xmm0, %xmm1, %xmm0
; AVX1-X86-NEXT: vcvtsd2ss %xmm0, %xmm0, %xmm0		; AVX1-X86-NEXT: vcvtsd2ss %xmm0, %xmm0, %xmm0
; AVX1-X86-NEXT: vmovss %xmm0, (%esp)		; AVX1-X86-NEXT: vmovss %xmm0, (%esp)
; AVX1-X86-NEXT: flds (%esp)		; AVX1-X86-NEXT: flds (%esp)
; AVX1-X86-NEXT: wait		; AVX1-X86-NEXT: wait
; AVX1-X86-NEXT: popl %eax		; AVX1-X86-NEXT: popl %eax
; AVX1-X86-NEXT: .cfi_def_cfa_offset 4		; AVX1-X86-NEXT: .cfi_def_cfa_offset 4
; AVX1-X86-NEXT: retl		; AVX1-X86-NEXT: retl
;		;
▲ Show 20 Lines • Show All 636 Lines • ▼ Show 20 Lines
; SSE-X86: # %bb.0:		; SSE-X86: # %bb.0:
; SSE-X86-NEXT: pushl %ebp		; SSE-X86-NEXT: pushl %ebp
; SSE-X86-NEXT: .cfi_def_cfa_offset 8		; SSE-X86-NEXT: .cfi_def_cfa_offset 8
; SSE-X86-NEXT: .cfi_offset %ebp, -8		; SSE-X86-NEXT: .cfi_offset %ebp, -8
; SSE-X86-NEXT: movl %esp, %ebp		; SSE-X86-NEXT: movl %esp, %ebp
; SSE-X86-NEXT: .cfi_def_cfa_register %ebp		; SSE-X86-NEXT: .cfi_def_cfa_register %ebp
; SSE-X86-NEXT: andl $-8, %esp		; SSE-X86-NEXT: andl $-8, %esp
; SSE-X86-NEXT: subl $8, %esp		; SSE-X86-NEXT: subl $8, %esp
; SSE-X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; SSE-X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-X86-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; SSE-X86-NEXT: orpd {{\.LCPI.*}}, %xmm0
; SSE-X86-NEXT: orpd %xmm0, %xmm1		; SSE-X86-NEXT: subsd {{\.LCPI.*}}, %xmm0
; SSE-X86-NEXT: subsd %xmm0, %xmm1		; SSE-X86-NEXT: movsd %xmm0, (%esp)
; SSE-X86-NEXT: movsd %xmm1, (%esp)
; SSE-X86-NEXT: fldl (%esp)		; SSE-X86-NEXT: fldl (%esp)
; SSE-X86-NEXT: wait		; SSE-X86-NEXT: wait
; SSE-X86-NEXT: movl %ebp, %esp		; SSE-X86-NEXT: movl %ebp, %esp
; SSE-X86-NEXT: popl %ebp		; SSE-X86-NEXT: popl %ebp
; SSE-X86-NEXT: .cfi_def_cfa %esp, 4		; SSE-X86-NEXT: .cfi_def_cfa %esp, 4
; SSE-X86-NEXT: retl		; SSE-X86-NEXT: retl
;		;
; SSE-X64-LABEL: uitofp_i32tof64:		; SSE-X64-LABEL: uitofp_i32tof64:
; SSE-X64: # %bb.0:		; SSE-X64: # %bb.0:
; SSE-X64-NEXT: movl %edi, %eax		; SSE-X64-NEXT: movl %edi, %eax
; SSE-X64-NEXT: cvtsi2sd %rax, %xmm0		; SSE-X64-NEXT: cvtsi2sd %rax, %xmm0
; SSE-X64-NEXT: retq		; SSE-X64-NEXT: retq
;		;
; AVX1-X86-LABEL: uitofp_i32tof64:		; AVX1-X86-LABEL: uitofp_i32tof64:
; AVX1-X86: # %bb.0:		; AVX1-X86: # %bb.0:
; AVX1-X86-NEXT: pushl %ebp		; AVX1-X86-NEXT: pushl %ebp
; AVX1-X86-NEXT: .cfi_def_cfa_offset 8		; AVX1-X86-NEXT: .cfi_def_cfa_offset 8
; AVX1-X86-NEXT: .cfi_offset %ebp, -8		; AVX1-X86-NEXT: .cfi_offset %ebp, -8
; AVX1-X86-NEXT: movl %esp, %ebp		; AVX1-X86-NEXT: movl %esp, %ebp
; AVX1-X86-NEXT: .cfi_def_cfa_register %ebp		; AVX1-X86-NEXT: .cfi_def_cfa_register %ebp
; AVX1-X86-NEXT: andl $-8, %esp		; AVX1-X86-NEXT: andl $-8, %esp
; AVX1-X86-NEXT: subl $8, %esp		; AVX1-X86-NEXT: subl $8, %esp
; AVX1-X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; AVX1-X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX1-X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero		; AVX1-X86-NEXT: vorpd {{\.LCPI.*}}, %xmm0, %xmm0
; AVX1-X86-NEXT: vorpd %xmm0, %xmm1, %xmm1		; AVX1-X86-NEXT: vsubsd {{\.LCPI.*}}, %xmm0, %xmm0
; AVX1-X86-NEXT: vsubsd %xmm0, %xmm1, %xmm0
; AVX1-X86-NEXT: vmovsd %xmm0, (%esp)		; AVX1-X86-NEXT: vmovsd %xmm0, (%esp)
; AVX1-X86-NEXT: fldl (%esp)		; AVX1-X86-NEXT: fldl (%esp)
; AVX1-X86-NEXT: wait		; AVX1-X86-NEXT: wait
; AVX1-X86-NEXT: movl %ebp, %esp		; AVX1-X86-NEXT: movl %ebp, %esp
; AVX1-X86-NEXT: popl %ebp		; AVX1-X86-NEXT: popl %ebp
; AVX1-X86-NEXT: .cfi_def_cfa %esp, 4		; AVX1-X86-NEXT: .cfi_def_cfa %esp, 4
; AVX1-X86-NEXT: retl		; AVX1-X86-NEXT: retl
;		;
▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp128-cast.ll

	Show First 20 Lines • Show All 1,254 Lines • ▼ Show 20 Lines
	; X64-SSE-LABEL: TestTruncCopysign:			; X64-SSE-LABEL: TestTruncCopysign:
	; X64-SSE: # %bb.0: # %entry			; X64-SSE: # %bb.0: # %entry
	; X64-SSE-NEXT: cmpl $50001, %edi # imm = 0xC351			; X64-SSE-NEXT: cmpl $50001, %edi # imm = 0xC351
	; X64-SSE-NEXT: jl .LBB26_2			; X64-SSE-NEXT: jl .LBB26_2
	; X64-SSE-NEXT: # %bb.1: # %if.then			; X64-SSE-NEXT: # %bb.1: # %if.then
	; X64-SSE-NEXT: pushq %rax			; X64-SSE-NEXT: pushq %rax
	; X64-SSE-NEXT: callq __trunctfdf2			; X64-SSE-NEXT: callq __trunctfdf2
	; X64-SSE-NEXT: andps {{.*}}(%rip), %xmm0			; X64-SSE-NEXT: andps {{.*}}(%rip), %xmm0
	; X64-SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X64-SSE-NEXT: orps {{.*}}(%rip), %xmm0
	; X64-SSE-NEXT: orps %xmm1, %xmm0
	; X64-SSE-NEXT: callq __extenddftf2			; X64-SSE-NEXT: callq __extenddftf2
	; X64-SSE-NEXT: addq $8, %rsp			; X64-SSE-NEXT: addq $8, %rsp
	; X64-SSE-NEXT: .LBB26_2: # %cleanup			; X64-SSE-NEXT: .LBB26_2: # %cleanup
	; X64-SSE-NEXT: retq			; X64-SSE-NEXT: retq
	;			;
	; X32-LABEL: TestTruncCopysign:			; X32-LABEL: TestTruncCopysign:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/scalar-int-to-fp.ll

	Show All 26 Lines
	; AVX512_64-LABEL: u32_to_f:			; AVX512_64-LABEL: u32_to_f:
	; AVX512_64: # %bb.0:			; AVX512_64: # %bb.0:
	; AVX512_64-NEXT: vcvtusi2ss %edi, %xmm0, %xmm0			; AVX512_64-NEXT: vcvtusi2ss %edi, %xmm0, %xmm0
	; AVX512_64-NEXT: retq			; AVX512_64-NEXT: retq
	;			;
	; SSE2_32-LABEL: u32_to_f:			; SSE2_32-LABEL: u32_to_f:
	; SSE2_32: # %bb.0:			; SSE2_32: # %bb.0:
	; SSE2_32-NEXT: pushl %eax			; SSE2_32-NEXT: pushl %eax
	; SSE2_32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE2_32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE2_32-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; SSE2_32-NEXT: orpd {{\.LCPI.*}}, %xmm0
	; SSE2_32-NEXT: orpd %xmm0, %xmm1			; SSE2_32-NEXT: subsd {{\.LCPI.*}}, %xmm0
	; SSE2_32-NEXT: subsd %xmm0, %xmm1			; SSE2_32-NEXT: cvtsd2ss %xmm0, %xmm0
	; SSE2_32-NEXT: xorps %xmm0, %xmm0
	; SSE2_32-NEXT: cvtsd2ss %xmm1, %xmm0
	; SSE2_32-NEXT: movss %xmm0, (%esp)			; SSE2_32-NEXT: movss %xmm0, (%esp)
	; SSE2_32-NEXT: flds (%esp)			; SSE2_32-NEXT: flds (%esp)
	; SSE2_32-NEXT: popl %eax			; SSE2_32-NEXT: popl %eax
	; SSE2_32-NEXT: retl			; SSE2_32-NEXT: retl
	;			;
	; SSE2_64-LABEL: u32_to_f:			; SSE2_64-LABEL: u32_to_f:
	; SSE2_64: # %bb.0:			; SSE2_64: # %bb.0:
	; SSE2_64-NEXT: movl %edi, %eax			; SSE2_64-NEXT: movl %edi, %eax
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; AVX512_64-NEXT: retq			; AVX512_64-NEXT: retq
	;			;
	; SSE2_32-LABEL: u32_to_d:			; SSE2_32-LABEL: u32_to_d:
	; SSE2_32: # %bb.0:			; SSE2_32: # %bb.0:
	; SSE2_32-NEXT: pushl %ebp			; SSE2_32-NEXT: pushl %ebp
	; SSE2_32-NEXT: movl %esp, %ebp			; SSE2_32-NEXT: movl %esp, %ebp
	; SSE2_32-NEXT: andl $-8, %esp			; SSE2_32-NEXT: andl $-8, %esp
	; SSE2_32-NEXT: subl $8, %esp			; SSE2_32-NEXT: subl $8, %esp
	; SSE2_32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE2_32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE2_32-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; SSE2_32-NEXT: orpd {{\.LCPI.*}}, %xmm0
	; SSE2_32-NEXT: orpd %xmm0, %xmm1			; SSE2_32-NEXT: subsd {{\.LCPI.*}}, %xmm0
	; SSE2_32-NEXT: subsd %xmm0, %xmm1			; SSE2_32-NEXT: movsd %xmm0, (%esp)
	; SSE2_32-NEXT: movsd %xmm1, (%esp)
	; SSE2_32-NEXT: fldl (%esp)			; SSE2_32-NEXT: fldl (%esp)
	; SSE2_32-NEXT: movl %ebp, %esp			; SSE2_32-NEXT: movl %ebp, %esp
	; SSE2_32-NEXT: popl %ebp			; SSE2_32-NEXT: popl %ebp
	; SSE2_32-NEXT: retl			; SSE2_32-NEXT: retl
	;			;
	; SSE2_64-LABEL: u32_to_d:			; SSE2_64-LABEL: u32_to_d:
	; SSE2_64: # %bb.0:			; SSE2_64: # %bb.0:
	; SSE2_64-NEXT: movl %edi, %eax			; SSE2_64-NEXT: movl %edi, %eax
	▲ Show 20 Lines • Show All 950 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/uint_to_fp-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=i386-unknown-unknown -mattr=+sse2 \| FileCheck %s

	; rdar://6504833			; rdar://6504833
	define float @test1(i32 %x) nounwind readnone {			define float @test1(i32 %x) nounwind readnone {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pushl %eax			; CHECK-NEXT: pushl %eax
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: orpd {{\.LCPI.*}}, %xmm0
	; CHECK-NEXT: orpd %xmm0, %xmm1			; CHECK-NEXT: subsd {{\.LCPI.*}}, %xmm0
				RKSimonUnsubmitted Not Done Reply Inline Actions @craig.topper @spatel This looks like we're loading the same constant as (double x) and <double x, double ???> - do you know if we already have examples of this or is this a new type of regression? RKSimon: @craig.topper @spatel This looks like we're loading the same constant as (double x) and <double…
				spatelUnsubmitted Not Done Reply Inline Actions We might avoid that by checking number of uses of the constant? Looks like a general shortcoming of combining/hoisting constants: $ cat vec_const.ll define <4 x i32> @f(<4 x i32> %x) { %a1 = add <4 x i32> %x, <i32 1, i32 2, i32 3, i32 4> %a2 = xor <4 x i32> %a1, <i32 1, i32 2, i32 3, i32 undef> ret <4 x i32> %a2 } $ llc -o - vec_const.ll .section __TEXT,__text,regular,pure_instructions .build_version macos, 10, 15 .section __TEXT,__literal16,16byte_literals .p2align 4 ## -- Begin function f LCPI0_0: .long 1 ## 0x1 .long 2 ## 0x2 .long 3 ## 0x3 .long 4 ## 0x4 LCPI0_1: .long 1 ## 0x1 .long 2 ## 0x2 .long 3 ## 0x3 .space 4 .section __TEXT,__text,regular,pure_instructions .globl _f .p2align 4, 0x90 _f: ## @f .cfi_startproc ## %bb.0: paddd LCPI0_0(%rip), %xmm0 pxor LCPI0_1(%rip), %xmm0 retq spatel: We might avoid that by checking number of uses of the constant? Looks like a general…
				RKSimonUnsubmitted Not Done Reply Inline Actions Thanks, I've raised a bug at https://bugs.llvm.org/show_bug.cgi?id=47744 RKSimon: Thanks, I've raised a bug at https://bugs.llvm.org/show_bug.cgi?id=47744
	; CHECK-NEXT: subsd %xmm0, %xmm1			; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0
	; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: cvtsd2ss %xmm1, %xmm0
	; CHECK-NEXT: movss %xmm0, (%esp)			; CHECK-NEXT: movss %xmm0, (%esp)
	; CHECK-NEXT: flds (%esp)			; CHECK-NEXT: flds (%esp)
	; CHECK-NEXT: popl %eax			; CHECK-NEXT: popl %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%0 = uitofp i32 %x to float			%0 = uitofp i32 %x to float
	ret float %0			ret float %0
	}			}

	; PR10802			; PR10802
	define float @test2(<4 x i32> %x) nounwind readnone ssp {			define float @test2(<4 x i32> %x) nounwind readnone ssp {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pushl %eax			; CHECK-NEXT: pushl %eax
	; CHECK-NEXT: xorps %xmm1, %xmm1			; CHECK-NEXT: xorps %xmm1, %xmm1
	; CHECK-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]			; CHECK-NEXT: movss {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: orps {{\.LCPI.*}}, %xmm1
	; CHECK-NEXT: orps %xmm0, %xmm1			; CHECK-NEXT: subsd {{\.LCPI.*}}, %xmm1
	; CHECK-NEXT: subsd %xmm0, %xmm1
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: cvtsd2ss %xmm1, %xmm0			; CHECK-NEXT: cvtsd2ss %xmm1, %xmm0
	; CHECK-NEXT: movss %xmm0, (%esp)			; CHECK-NEXT: movss %xmm0, (%esp)
	; CHECK-NEXT: flds (%esp)			; CHECK-NEXT: flds (%esp)
	; CHECK-NEXT: popl %eax			; CHECK-NEXT: popl %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%vecext = extractelement <4 x i32> %x, i32 0			%vecext = extractelement <4 x i32> %x, i32 0
	%conv = uitofp i32 %vecext to float			%conv = uitofp i32 %vecext to float
	ret float %conv			ret float %conv
	}			}

llvm/test/CodeGen/X86/vector-shuffle-combining.ll

Show First 20 Lines • Show All 3,055 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%tmp10 = insertelement <8 x i16> %tmp9, i16 %cvt1, i32 5		%tmp10 = insertelement <8 x i16> %tmp9, i16 %cvt1, i32 5
%tmp11 = insertelement <8 x i16> %tmp10, i16 %cvt2, i32 6		%tmp11 = insertelement <8 x i16> %tmp10, i16 %cvt2, i32 6
%tmp12 = insertelement <8 x i16> %tmp11, i16 undef, i32 7		%tmp12 = insertelement <8 x i16> %tmp11, i16 undef, i32 7
%tmp13 = shufflevector <8 x i16> %tmp12, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 4, i32 5, i32 6, i32 7>		%tmp13 = shufflevector <8 x i16> %tmp12, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i16> %tmp13		ret <8 x i16> %tmp13
}		}

define void @PR43024() {		define void @PR43024() {
; SSE2-LABEL: PR43024:		; SSE-LABEL: PR43024:
; SSE2: # %bb.0:		; SSE: # %bb.0:
; SSE2-NEXT: movaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]		; SSE-NEXT: movaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
; SSE2-NEXT: movaps %xmm0, (%rax)		; SSE-NEXT: movaps %xmm0, (%rax)
; SSE2-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: addss {{.*}}(%rip), %xmm0
; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[1,1]		; SSE-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: addss %xmm0, %xmm1		; SSE-NEXT: addss %xmm1, %xmm0
; SSE2-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: addss %xmm1, %xmm0
; SSE2-NEXT: addss %xmm0, %xmm1		; SSE-NEXT: movss %xmm0, (%rax)
; SSE2-NEXT: addss %xmm0, %xmm1		; SSE-NEXT: retq
; SSE2-NEXT: movss %xmm1, (%rax)
; SSE2-NEXT: retq
;
; SSSE3-LABEL: PR43024:
; SSSE3: # %bb.0:
; SSSE3-NEXT: movaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
; SSSE3-NEXT: movaps %xmm0, (%rax)
; SSSE3-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; SSSE3-NEXT: addss %xmm0, %xmm1
; SSSE3-NEXT: xorps %xmm0, %xmm0
; SSSE3-NEXT: addss %xmm0, %xmm1
; SSSE3-NEXT: addss %xmm0, %xmm1
; SSSE3-NEXT: movss %xmm1, (%rax)
; SSSE3-NEXT: retq
;
; SSE41-LABEL: PR43024:
; SSE41: # %bb.0:
; SSE41-NEXT: movaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
; SSE41-NEXT: movaps %xmm0, (%rax)
; SSE41-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; SSE41-NEXT: addss %xmm0, %xmm1
; SSE41-NEXT: xorps %xmm0, %xmm0
; SSE41-NEXT: addss %xmm0, %xmm1
; SSE41-NEXT: addss %xmm0, %xmm1
; SSE41-NEXT: movss %xmm1, (%rax)
; SSE41-NEXT: retq
;		;
; AVX-LABEL: PR43024:		; AVX-LABEL: PR43024:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]		; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
; AVX-NEXT: vmovaps %xmm0, (%rax)		; AVX-NEXT: vmovaps %xmm0, (%rax)
; AVX-NEXT: vaddss {{\.LCPI.}}+{{.}}(%rip), %xmm0, %xmm0		; AVX-NEXT: vaddss {{\.LCPI.}}+{{.}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0		; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SDag] SimplifyDemandedBits: simplify to FP constant if all bits knownClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 296617

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/ARM/fcopysign.ll

llvm/test/CodeGen/X86/combine-bextr.ll

llvm/test/CodeGen/X86/copysign-constant-magnitude.ll

llvm/test/CodeGen/X86/fp-intrinsics.ll

llvm/test/CodeGen/X86/fp-round.ll

llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll

llvm/test/CodeGen/X86/fp128-cast.ll

llvm/test/CodeGen/X86/scalar-int-to-fp.ll

llvm/test/CodeGen/X86/uint_to_fp-2.ll

llvm/test/CodeGen/X86/vector-shuffle-combining.ll

[SDag] SimplifyDemandedBits: simplify to FP constant if all bits known
ClosedPublic