This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
pow.ll
-
X86/
-
pow.ll

Differential D51630

[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
ClosedPublic

Authored by spatel on Sep 4 2018, 7:21 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
efriedma
fhahn
evandro
javed.absar
t.p.northover

Commits

rGdbf52837fea5: [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
rL341481: [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))

Summary

This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. I noted the potential enhancements that I imagined with TODO comments:

Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests).

A follow-on patch can extend this to handle the other pattern that we deferred: pow(x,1/3) --> cbrt. But that requires a bit more work because we don't currently have a FCBRT DAG node defined.

Diff Detail

Event Timeline

spatel created this revision.Sep 4 2018, 7:21 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 4 2018, 7:21 AM

Herald added a subscriber: mcrosier. · View Herald Transcript

spatel edited the summary of this revision. (Show Details)Sep 4 2018, 10:27 AM

Could you possibly duplicate the tests for ARM? It looks like the patch does the right thing to me, but it'd be good to have it confirmed and tested.

In D51630#1224578, @t.p.northover wrote:

Could you possibly duplicate the tests for ARM? It looks like the patch does the right thing to me, but it'd be good to have it confirmed and tested.

Sure. I'm generally not sure what the right target specs are for an ARM RUN line.
Is this good: -mtriple=arm-eabi -mattr=neon ?
Do we want more RUNs to check codegen with other features?

They'd probably work. We often just use useful triples, maybe thumbv8-linux-gnueabihf, and perhaps thumbv7m-linux-gnueabi as a soft-float target that ought to use libcalls for everything (I'd probably just check there are the appropriate number of calls there rather than tracking all the marshalling nonsense that goes on).

Patch updated:
Added ARM (Thumb) tests. No changes otherwise.

Thanks! LGTM.

This revision is now accepted and ready to land.Sep 5 2018, 8:32 AM

Closed by commit rL341481: [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)) (authored by spatel). · Explain WhySep 5 2018, 10:03 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D51753: [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x).Sep 6 2018, 2:16 PM

spatel mentioned this in rL342348: [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x).Sep 16 2018, 9:51 AM

efriedma mentioned this in D101759: [PowerPC] Scalar IBM MASS library conversion pass.Aug 30 2021, 2:41 PM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

41 lines

test/

CodeGen/

AArch64/

pow.ll

69 lines

X86/

pow.ll

95 lines

Diff 163805

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	private:
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
SDValue visitFREM(SDNode *N);		SDValue visitFREM(SDNode *N);
SDValue visitFSQRT(SDNode *N);		SDValue visitFSQRT(SDNode *N);
SDValue visitFCOPYSIGN(SDNode *N);		SDValue visitFCOPYSIGN(SDNode *N);
		SDValue visitFPOW(SDNode *N);
SDValue visitSINT_TO_FP(SDNode *N);		SDValue visitSINT_TO_FP(SDNode *N);
SDValue visitUINT_TO_FP(SDNode *N);		SDValue visitUINT_TO_FP(SDNode *N);
SDValue visitFP_TO_SINT(SDNode *N);		SDValue visitFP_TO_SINT(SDNode *N);
SDValue visitFP_TO_UINT(SDNode *N);		SDValue visitFP_TO_UINT(SDNode *N);
SDValue visitFP_ROUND(SDNode *N);		SDValue visitFP_ROUND(SDNode *N);
SDValue visitFP_ROUND_INREG(SDNode *N);		SDValue visitFP_ROUND_INREG(SDNode *N);
SDValue visitFP_EXTEND(SDNode *N);		SDValue visitFP_EXTEND(SDNode *N);
SDValue visitFNEG(SDNode *N);		SDValue visitFNEG(SDNode *N);
▲ Show 20 Lines • Show All 1,202 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
case ISD::FREM: return visitFREM(N);		case ISD::FREM: return visitFREM(N);
case ISD::FSQRT: return visitFSQRT(N);		case ISD::FSQRT: return visitFSQRT(N);
case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);		case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);
		case ISD::FPOW: return visitFPOW(N);
case ISD::SINT_TO_FP: return visitSINT_TO_FP(N);		case ISD::SINT_TO_FP: return visitSINT_TO_FP(N);
case ISD::UINT_TO_FP: return visitUINT_TO_FP(N);		case ISD::UINT_TO_FP: return visitUINT_TO_FP(N);
case ISD::FP_TO_SINT: return visitFP_TO_SINT(N);		case ISD::FP_TO_SINT: return visitFP_TO_SINT(N);
case ISD::FP_TO_UINT: return visitFP_TO_UINT(N);		case ISD::FP_TO_UINT: return visitFP_TO_UINT(N);
case ISD::FP_ROUND: return visitFP_ROUND(N);		case ISD::FP_ROUND: return visitFP_ROUND(N);
case ISD::FP_ROUND_INREG: return visitFP_ROUND_INREG(N);		case ISD::FP_ROUND_INREG: return visitFP_ROUND_INREG(N);
case ISD::FP_EXTEND: return visitFP_EXTEND(N);		case ISD::FP_EXTEND: return visitFP_EXTEND(N);
case ISD::FNEG: return visitFNEG(N);		case ISD::FNEG: return visitFNEG(N);
▲ Show 20 Lines • Show All 9,982 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {
// copysign(x, fp_extend(y)) -> copysign(x, y)		// copysign(x, fp_extend(y)) -> copysign(x, y)
// copysign(x, fp_round(y)) -> copysign(x, y)		// copysign(x, fp_round(y)) -> copysign(x, y)
if (CanCombineFCOPYSIGN_EXTEND_ROUND(N))		if (CanCombineFCOPYSIGN_EXTEND_ROUND(N))
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(0));		return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(0));

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitFPOW(SDNode *N) {
		ConstantFPSDNode *ExponentC = isConstOrConstSplatFP(N->getOperand(1));
		if (!ExponentC)
		return SDValue();

		// Try to convert x ** (1/4) into square roots.
		// x ** (1/2) is canonicalized to sqrt, so we do not bother with that case.
		// TODO: This could be extended (using a target hook) to handle smaller
		// power-of-2 fractional exponents.
		if (ExponentC->getValueAPF().isExactlyValue(0.25)) {
		// pow(-0.0, 0.25) = +0.0; sqrt(sqrt(-0.0)) = -0.0.
		// pow(-inf, 0.25) = +inf; sqrt(sqrt(-inf)) = NaN.
		// For regular numbers, rounding may cause the results to differ.
		// Therefore, we require { nsz ninf afn } for this transform.
		// TODO: We could select out the special cases if we don't have nsz/ninf.
		SDNodeFlags Flags = N->getFlags();
		if (!Flags.hasNoSignedZeros() \|\| !Flags.hasNoInfs() \|\|
		!Flags.hasApproximateFuncs())
		return SDValue();

		// Don't double the number of libcalls. We are trying to inline fast code.
		EVT VT = N->getValueType(0);
		if (!DAG.getTargetLoweringInfo().isOperationLegalOrCustom(ISD::FSQRT, VT))
		return SDValue();

		// Assume that libcalls are the smallest code.
		// TODO: This restriction should probably be lifted for vectors.
		if (DAG.getMachineFunction().getFunction().optForSize())
		return SDValue();

		// pow(X, 0.25) --> sqrt(sqrt(X))
		SDLoc DL(N);
		SDValue Sqrt = DAG.getNode(ISD::FSQRT, DL, VT, N->getOperand(0), Flags);
		return DAG.getNode(ISD::FSQRT, DL, VT, Sqrt, Flags);
		}

		return SDValue();
		}

static SDValue foldFPToIntToFP(SDNode *N, SelectionDAG &DAG,		static SDValue foldFPToIntToFP(SDNode *N, SelectionDAG &DAG,
const TargetLowering &TLI) {		const TargetLowering &TLI) {
// This optimization is guarded by a function attribute because it may produce		// This optimization is guarded by a function attribute because it may produce
// unexpected results. Ie, programs may be relying on the platform-specific		// unexpected results. Ie, programs may be relying on the platform-specific
// undefined behavior when the float-to-int conversion overflows.		// undefined behavior when the float-to-int conversion overflows.
const Function &F = DAG.getMachineFunction().getFunction();		const Function &F = DAG.getMachineFunction().getFunction();
Attribute StrictOverflow = F.getFnAttribute("strict-float-cast-overflow");		Attribute StrictOverflow = F.getFnAttribute("strict-float-cast-overflow");
if (StrictOverflow.getValueAsString().equals("false"))		if (StrictOverflow.getValueAsString().equals("false"))
▲ Show 20 Lines • Show All 7,186 Lines • Show Last 20 Lines

test/CodeGen/AArch64/pow.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-- \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-- \| FileCheck %s

	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)

	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)
	declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)

	define float @pow_f32_one_fourth_fmf(float %x) nounwind {			define float @pow_f32_one_fourth_fmf(float %x) nounwind {
	; CHECK-LABEL: pow_f32_one_fourth_fmf:			; CHECK-LABEL: pow_f32_one_fourth_fmf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmov s1, #0.25000000			; CHECK-NEXT: fsqrt s0, s0
	; CHECK-NEXT: b powf			; CHECK-NEXT: fsqrt s0, s0
				; CHECK-NEXT: ret
	%r = call nsz ninf afn float @llvm.pow.f32(float %x, float 2.5e-01)			%r = call nsz ninf afn float @llvm.pow.f32(float %x, float 2.5e-01)
	ret float %r			ret float %r
	}			}

	define double @pow_f64_one_fourth_fmf(double %x) nounwind {			define double @pow_f64_one_fourth_fmf(double %x) nounwind {
	; CHECK-LABEL: pow_f64_one_fourth_fmf:			; CHECK-LABEL: pow_f64_one_fourth_fmf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmov d1, #0.25000000			; CHECK-NEXT: fsqrt d0, d0
	; CHECK-NEXT: b pow			; CHECK-NEXT: fsqrt d0, d0
				; CHECK-NEXT: ret
	%r = call nsz ninf afn double @llvm.pow.f64(double %x, double 2.5e-01)			%r = call nsz ninf afn double @llvm.pow.f64(double %x, double 2.5e-01)
	ret double %r			ret double %r
	}			}

	define <4 x float> @pow_v4f32_one_fourth_fmf(<4 x float> %x) nounwind {			define <4 x float> @pow_v4f32_one_fourth_fmf(<4 x float> %x) nounwind {
	; CHECK-LABEL: pow_v4f32_one_fourth_fmf:			; CHECK-LABEL: pow_v4f32_one_fourth_fmf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #48 // =48			; CHECK-NEXT: fsqrt v0.4s, v0.4s
	; CHECK-NEXT: str d8, [sp, #32] // 8-byte Folded Spill			; CHECK-NEXT: fsqrt v0.4s, v0.4s
	; CHECK-NEXT: fmov s8, #0.25000000
	; CHECK-NEXT: str q0, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: mov s0, v0.s[1]
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: str x30, [sp, #40] // 8-byte Folded Spill
	; CHECK-NEXT: bl powf
	; CHECK-NEXT: str d0, [sp] // 16-byte Folded Spill
	; CHECK-NEXT: ldr q0, [sp, #16] // 16-byte Folded Reload
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: // kill: def $s0 killed $s0 killed $q0
	; CHECK-NEXT: bl powf
	; CHECK-NEXT: ldr q1, [sp] // 16-byte Folded Reload
	; CHECK-NEXT: // kill: def $s0 killed $s0 def $q0
	; CHECK-NEXT: mov v0.s[1], v1.s[0]
	; CHECK-NEXT: str q0, [sp] // 16-byte Folded Spill
	; CHECK-NEXT: ldr q0, [sp, #16] // 16-byte Folded Reload
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: mov s0, v0.s[2]
	; CHECK-NEXT: bl powf
	; CHECK-NEXT: ldr q1, [sp] // 16-byte Folded Reload
	; CHECK-NEXT: // kill: def $s0 killed $s0 def $q0
	; CHECK-NEXT: mov v1.s[2], v0.s[0]
	; CHECK-NEXT: ldr q0, [sp, #16] // 16-byte Folded Reload
	; CHECK-NEXT: str q1, [sp] // 16-byte Folded Spill
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: mov s0, v0.s[3]
	; CHECK-NEXT: bl powf
	; CHECK-NEXT: ldr q1, [sp] // 16-byte Folded Reload
	; CHECK-NEXT: ldr x30, [sp, #40] // 8-byte Folded Reload
	; CHECK-NEXT: ldr d8, [sp, #32] // 8-byte Folded Reload
	; CHECK-NEXT: // kill: def $s0 killed $s0 def $q0
	; CHECK-NEXT: mov v1.s[3], v0.s[0]
	; CHECK-NEXT: mov v0.16b, v1.16b
	; CHECK-NEXT: add sp, sp, #48 // =48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 2.5e-1, float 2.5e-1, float 2.5e-01, float 2.5e-01>)			%r = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 2.5e-1, float 2.5e-1, float 2.5e-01, float 2.5e-01>)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @pow_v2f64_one_fourth_fmf(<2 x double> %x) nounwind {			define <2 x double> @pow_v2f64_one_fourth_fmf(<2 x double> %x) nounwind {
	; CHECK-LABEL: pow_v2f64_one_fourth_fmf:			; CHECK-LABEL: pow_v2f64_one_fourth_fmf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #48 // =48			; CHECK-NEXT: fsqrt v0.2d, v0.2d
	; CHECK-NEXT: str d8, [sp, #32] // 8-byte Folded Spill			; CHECK-NEXT: fsqrt v0.2d, v0.2d
	; CHECK-NEXT: fmov d8, #0.25000000
	; CHECK-NEXT: str q0, [sp] // 16-byte Folded Spill
	; CHECK-NEXT: mov d0, v0.d[1]
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: str x30, [sp, #40] // 8-byte Folded Spill
	; CHECK-NEXT: bl pow
	; CHECK-NEXT: str q0, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: ldr q0, [sp] // 16-byte Folded Reload
	; CHECK-NEXT: mov v1.16b, v8.16b
	; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
	; CHECK-NEXT: bl pow
	; CHECK-NEXT: ldr q1, [sp, #16] // 16-byte Folded Reload
	; CHECK-NEXT: ldr x30, [sp, #40] // 8-byte Folded Reload
	; CHECK-NEXT: ldr d8, [sp, #32] // 8-byte Folded Reload
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: mov v0.d[1], v1.d[0]
	; CHECK-NEXT: add sp, sp, #48 // =48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 2.5e-1, double 2.5e-1>)			%r = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 2.5e-1, double 2.5e-1>)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define float @pow_f32_one_fourth_not_enough_fmf(float %x) nounwind {			define float @pow_f32_one_fourth_not_enough_fmf(float %x) nounwind {
	; CHECK-LABEL: pow_f32_one_fourth_not_enough_fmf:			; CHECK-LABEL: pow_f32_one_fourth_not_enough_fmf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/CodeGen/X86/pow.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s

	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)

	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)
	declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)

	define float @pow_f32_one_fourth_fmf(float %x) nounwind {			define float @pow_f32_one_fourth_fmf(float %x) nounwind {
	; CHECK-LABEL: pow_f32_one_fourth_fmf:			; CHECK-LABEL: pow_f32_one_fourth_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: rsqrtss %xmm0, %xmm1
	; CHECK-NEXT: jmp powf # TAILCALL			; CHECK-NEXT: movaps %xmm0, %xmm2
				; CHECK-NEXT: mulss %xmm1, %xmm2
				; CHECK-NEXT: movss {{.*#+}} xmm3 = mem[0],zero,zero,zero
				; CHECK-NEXT: movaps %xmm2, %xmm4
				; CHECK-NEXT: mulss %xmm3, %xmm4
				; CHECK-NEXT: mulss %xmm1, %xmm2
				; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; CHECK-NEXT: addss %xmm1, %xmm2
				; CHECK-NEXT: mulss %xmm4, %xmm2
				; CHECK-NEXT: xorps %xmm4, %xmm4
				; CHECK-NEXT: cmpeqss %xmm4, %xmm0
				; CHECK-NEXT: andnps %xmm2, %xmm0
				; CHECK-NEXT: xorps %xmm2, %xmm2
				; CHECK-NEXT: rsqrtss %xmm0, %xmm2
				; CHECK-NEXT: movaps %xmm0, %xmm5
				; CHECK-NEXT: mulss %xmm2, %xmm5
				; CHECK-NEXT: mulss %xmm5, %xmm3
				; CHECK-NEXT: mulss %xmm2, %xmm5
				; CHECK-NEXT: addss %xmm1, %xmm5
				; CHECK-NEXT: mulss %xmm3, %xmm5
				; CHECK-NEXT: cmpeqss %xmm4, %xmm0
				; CHECK-NEXT: andnps %xmm5, %xmm0
				; CHECK-NEXT: retq
	%r = call nsz ninf afn float @llvm.pow.f32(float %x, float 2.5e-01)			%r = call nsz ninf afn float @llvm.pow.f32(float %x, float 2.5e-01)
	ret float %r			ret float %r
	}			}

	define double @pow_f64_one_fourth_fmf(double %x) nounwind {			define double @pow_f64_one_fourth_fmf(double %x) nounwind {
	; CHECK-LABEL: pow_f64_one_fourth_fmf:			; CHECK-LABEL: pow_f64_one_fourth_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: sqrtsd %xmm0, %xmm0
	; CHECK-NEXT: jmp pow # TAILCALL			; CHECK-NEXT: sqrtsd %xmm0, %xmm0
				; CHECK-NEXT: retq
	%r = call nsz ninf afn double @llvm.pow.f64(double %x, double 2.5e-01)			%r = call nsz ninf afn double @llvm.pow.f64(double %x, double 2.5e-01)
	ret double %r			ret double %r
	}			}

	define <4 x float> @pow_v4f32_one_fourth_fmf(<4 x float> %x) nounwind {			define <4 x float> @pow_v4f32_one_fourth_fmf(<4 x float> %x) nounwind {
	; CHECK-LABEL: pow_v4f32_one_fourth_fmf:			; CHECK-LABEL: pow_v4f32_one_fourth_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: subq $56, %rsp			; CHECK-NEXT: rsqrtps %xmm0, %xmm1
	; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill			; CHECK-NEXT: movaps %xmm0, %xmm2
	; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]			; CHECK-NEXT: mulps %xmm1, %xmm2
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: movaps {{.*#+}} xmm3 = [-5.000000e-01,-5.000000e-01,-5.000000e-01,-5.000000e-01]
	; CHECK-NEXT: callq powf			; CHECK-NEXT: movaps %xmm2, %xmm4
	; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: mulps %xmm3, %xmm4
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload			; CHECK-NEXT: mulps %xmm1, %xmm2
	; CHECK-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; CHECK-NEXT: movaps {{.*#+}} xmm1 = [-3.000000e+00,-3.000000e+00,-3.000000e+00,-3.000000e+00]
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: addps %xmm1, %xmm2
	; CHECK-NEXT: callq powf			; CHECK-NEXT: mulps %xmm4, %xmm2
	; CHECK-NEXT: unpcklps (%rsp), %xmm0 # 16-byte Folded Reload			; CHECK-NEXT: xorps %xmm4, %xmm4
	; CHECK-NEXT: # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]			; CHECK-NEXT: cmpneqps %xmm4, %xmm0
	; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: andps %xmm2, %xmm0
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload			; CHECK-NEXT: rsqrtps %xmm0, %xmm2
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: movaps %xmm0, %xmm5
	; CHECK-NEXT: callq powf			; CHECK-NEXT: mulps %xmm2, %xmm5
	; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill			; CHECK-NEXT: mulps %xmm5, %xmm3
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload			; CHECK-NEXT: mulps %xmm2, %xmm5
	; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]			; CHECK-NEXT: addps %xmm1, %xmm5
	; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; CHECK-NEXT: mulps %xmm3, %xmm5
	; CHECK-NEXT: callq powf			; CHECK-NEXT: cmpneqps %xmm4, %xmm0
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload			; CHECK-NEXT: andps %xmm5, %xmm0
	; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	; CHECK-NEXT: unpcklpd (%rsp), %xmm1 # 16-byte Folded Reload
	; CHECK-NEXT: # xmm1 = xmm1[0],mem[0]
	; CHECK-NEXT: movaps %xmm1, %xmm0
	; CHECK-NEXT: addq $56, %rsp
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%r = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 2.5e-1, float 2.5e-1, float 2.5e-01, float 2.5e-01>)			%r = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 2.5e-1, float 2.5e-1, float 2.5e-01, float 2.5e-01>)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @pow_v2f64_one_fourth_fmf(<2 x double> %x) nounwind {			define <2 x double> @pow_v2f64_one_fourth_fmf(<2 x double> %x) nounwind {
	; CHECK-LABEL: pow_v2f64_one_fourth_fmf:			; CHECK-LABEL: pow_v2f64_one_fourth_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: subq $40, %rsp			; CHECK-NEXT: sqrtpd %xmm0, %xmm0
	; CHECK-NEXT: movaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: sqrtpd %xmm0, %xmm0
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; CHECK-NEXT: callq pow
	; CHECK-NEXT: movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
	; CHECK-NEXT: movaps (%rsp), %xmm0 # 16-byte Reload
	; CHECK-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; CHECK-NEXT: callq pow
	; CHECK-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
	; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
	; CHECK-NEXT: movaps %xmm1, %xmm0
	; CHECK-NEXT: addq $40, %rsp
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%r = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 2.5e-1, double 2.5e-1>)			%r = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 2.5e-1, double 2.5e-1>)
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define float @pow_f32_one_fourth_not_enough_fmf(float %x) nounwind {			define float @pow_f32_one_fourth_not_enough_fmf(float %x) nounwind {
	; CHECK-LABEL: pow_f32_one_fourth_not_enough_fmf:			; CHECK-LABEL: pow_f32_one_fourth_not_enough_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines