This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always use rcp + mul with fast math
ClosedPublic

Authored by rampitec on Jun 29 2017, 4:16 PM.

Download Raw Diff

Details

Reviewers

b-sumner
arsenm

Commits

rG9d7b1c9ddba6: [AMDGPU] Always use rcp + mul with fast math
rL307308: [AMDGPU] Always use rcp + mul with fast math

Summary

Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Diff Detail

Event Timeline

rampitec created this revision.Jun 29 2017, 4:16 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJun 29 2017, 4:16 PM

arsenm added inline comments.Jun 29 2017, 4:22 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	This is a possible source of errors now that less-strict flags can now trigger this

rampitec added inline comments.Jun 29 2017, 4:30 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	That is the intent to trigger this. That is how HSAIL compiler works and we had no complaints so far. Anyway, I am running confirmance now. A philosophical question though what shall preval, unsafe fp or denorm support. Once again, HSAIL favors relaxation, so I did the same. In fact this implementation is stricter than one we have in HSAIL. We were applying options to library as well, while here it is only applied to user code.

rampitec added inline comments.Jun 29 2017, 4:59 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	Actually fp denorms are not supported with 2.5 ulp fdiv even now, so there is no change in this respect.

rampitec added inline comments.Jun 29 2017, 5:56 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	JFYI. OCL conformance passed.

arsenm added inline comments.Jun 29 2017, 6:20 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	Adding unsafe algebra to the node from just arcp is incorrect. I wouldn't expect this to show up in conformance, this could potentially introduce unsafe algebraic transforms in use instructions.This should at most preserve the original set of math flags, not promote them.

Preserved flags on a new node instead of forging them.

rampitec added inline comments.Jun 29 2017, 8:17 PM

lib/Target/AMDGPU/SIISelLowering.cpp
3778–3779	Strange that was not noticed when this code was written. Fixed.

Ping.

LGTM. This could still preserve the intersection of the incoming flags but that's a separate patch

This revision is now accepted and ready to land.Jul 6 2017, 11:41 AM

Closed by commit rL307308: [AMDGPU] Always use rcp + mul with fast math (authored by rampitec). · Explain WhyJul 6 2017, 1:34 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUCodeGenPrepare.cpp

4 lines

SIISelLowering.cpp

12 lines

test/

CodeGen/

AMDGPU/

amdgpu-codegenprepare-fdiv.ll

38 lines

fdiv.ll

37 lines

Diff 104812

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show First 20 Lines • Show All 374 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
const FPMathOperator *FPOp = cast<const FPMathOperator>(&FDiv);		const FPMathOperator *FPOp = cast<const FPMathOperator>(&FDiv);
float ULP = FPOp->getFPAccuracy();		float ULP = FPOp->getFPAccuracy();
if (ULP < 2.5f)		if (ULP < 2.5f)
return false;		return false;

FastMathFlags FMF = FPOp->getFastMathFlags();		FastMathFlags FMF = FPOp->getFastMathFlags();
bool UnsafeDiv = HasUnsafeFPMath \|\| FMF.unsafeAlgebra() \|\|		bool UnsafeDiv = HasUnsafeFPMath \|\| FMF.unsafeAlgebra() \|\|
FMF.allowReciprocal();		FMF.allowReciprocal();
if (ST->hasFP32Denormals() && !UnsafeDiv)
		// With UnsafeDiv node will be optimized to just rcp and mul.
		if (ST->hasFP32Denormals() \|\| UnsafeDiv)
return false;		return false;

IRBuilder<> Builder(FDiv.getParent(), std::next(FDiv.getIterator()), FPMath);		IRBuilder<> Builder(FDiv.getParent(), std::next(FDiv.getIterator()), FPMath);
Builder.setFastMathFlags(FMF);		Builder.setFastMathFlags(FMF);
Builder.SetCurrentDebugLocation(FDiv.getDebugLoc());		Builder.SetCurrentDebugLocation(FDiv.getDebugLoc());

Function *Decl = Intrinsic::getDeclaration(Mod, Intrinsic::amdgcn_fdiv_fast);		Function *Decl = Intrinsic::getDeclaration(Mod, Intrinsic::amdgcn_fdiv_fast);

▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 3,730 Lines • ▼ Show 20 Lines
// Catch division cases where we can use shortcuts with rcp and rsq		// Catch division cases where we can use shortcuts with rcp and rsq
// instructions.		// instructions.
SDValue SITargetLowering::lowerFastUnsafeFDIV(SDValue Op,		SDValue SITargetLowering::lowerFastUnsafeFDIV(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
bool Unsafe = DAG.getTarget().Options.UnsafeFPMath;		const SDNodeFlags Flags = Op->getFlags();
		bool Unsafe = DAG.getTarget().Options.UnsafeFPMath \|\|
		Flags.hasUnsafeAlgebra() \|\| Flags.hasAllowReciprocal();

if (!Unsafe && VT == MVT::f32 && Subtarget->hasFP32Denormals())		if (!Unsafe && VT == MVT::f32 && Subtarget->hasFP32Denormals())
return SDValue();		return SDValue();

if (const ConstantFPSDNode *CLHS = dyn_cast<ConstantFPSDNode>(LHS)) {		if (const ConstantFPSDNode *CLHS = dyn_cast<ConstantFPSDNode>(LHS)) {
if (Unsafe \|\| VT == MVT::f32 \|\| VT == MVT::f16) {		if (Unsafe \|\| VT == MVT::f32 \|\| VT == MVT::f16) {
if (CLHS->isExactlyValue(1.0)) {		if (CLHS->isExactlyValue(1.0)) {
// v_rcp_f32 and v_rsq_f32 do not support denormals, and according to		// v_rcp_f32 and v_rsq_f32 do not support denormals, and according to
Show All 18 Lines	if (Unsafe \|\| VT == MVT::f32 \|\| VT == MVT::f16) {
if (CLHS->isExactlyValue(-1.0)) {		if (CLHS->isExactlyValue(-1.0)) {
// -1.0 / x -> rcp (fneg x)		// -1.0 / x -> rcp (fneg x)
SDValue FNegRHS = DAG.getNode(ISD::FNEG, SL, VT, RHS);		SDValue FNegRHS = DAG.getNode(ISD::FNEG, SL, VT, RHS);
return DAG.getNode(AMDGPUISD::RCP, SL, VT, FNegRHS);		return DAG.getNode(AMDGPUISD::RCP, SL, VT, FNegRHS);
}		}
}		}
}		}

const SDNodeFlags Flags = Op->getFlags();		if (Unsafe) {

if (Unsafe \|\| Flags.hasAllowReciprocal()) {
// Turn into multiply by the reciprocal.		// Turn into multiply by the reciprocal.
// x / y -> x * (1.0 / y)		// x / y -> x * (1.0 / y)
SDNodeFlags NewFlags;
NewFlags.setUnsafeAlgebra(true);
SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);		SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);
		arsenmUnsubmitted Not Done Reply Inline Actions This is a possible source of errors now that less-strict flags can now trigger this arsenm: This is a possible source of errors now that less-strict flags can now trigger this
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions That is the intent to trigger this. That is how HSAIL compiler works and we had no complaints so far. Anyway, I am running confirmance now. A philosophical question though what shall preval, unsafe fp or denorm support. Once again, HSAIL favors relaxation, so I did the same. In fact this implementation is stricter than one we have in HSAIL. We were applying options to library as well, while here it is only applied to user code. rampitec: That is the intent to trigger this. That is how HSAIL compiler works and we had no complaints…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions Actually fp denorms are not supported with 2.5 ulp fdiv even now, so there is no change in this respect. rampitec: Actually fp denorms are not supported with 2.5 ulp fdiv even now, so there is no change in this…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions JFYI. OCL conformance passed. rampitec: JFYI. OCL conformance passed.
		arsenmUnsubmitted Done Reply Inline Actions Adding unsafe algebra to the node from just arcp is incorrect. I wouldn't expect this to show up in conformance, this could potentially introduce unsafe algebraic transforms in use instructions.This should at most preserve the original set of math flags, not promote them. arsenm: Adding unsafe algebra to the node from just arcp is incorrect. I wouldn't expect this to show…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions Strange that was not noticed when this code was written. Fixed. rampitec: Strange that was not noticed when this code was written. Fixed.
return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, NewFlags);		return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, Flags);
}		}

return SDValue();		return SDValue();
}		}

static SDValue getFPBinOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,		static SDValue getFPBinOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,
EVT VT, SDValue A, SDValue B, SDValue GlueChain) {		EVT VT, SDValue A, SDValue B, SDValue GlueChain) {
if (GlueChain->getNumValues() <= 1) {		if (GlueChain->getNumValues() <= 1) {
▲ Show 20 Lines • Show All 1,953 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll

Show All 10 Lines
}		}

; CHECK-LABEL: @fdiv_fpmath(		; CHECK-LABEL: @fdiv_fpmath(
; CHECK: %no.md = fdiv float %a, %b{{$}}		; CHECK: %no.md = fdiv float %a, %b{{$}}
; CHECK: %md.half.ulp = fdiv float %a, %b, !fpmath !1		; CHECK: %md.half.ulp = fdiv float %a, %b, !fpmath !1
; CHECK: %md.1ulp = fdiv float %a, %b, !fpmath !2		; CHECK: %md.1ulp = fdiv float %a, %b, !fpmath !2
; CHECK: %md.25ulp = call float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0		; CHECK: %md.25ulp = call float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0
; CHECK: %md.3ulp = call float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !3		; CHECK: %md.3ulp = call float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !3
; CHECK: %fast.md.25ulp = call fast float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0		; CHECK: %fast.md.25ulp = fdiv fast float %a, %b, !fpmath !0
; CHECK: arcp.md.25ulp = call arcp float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0		; CHECK: arcp.md.25ulp = fdiv arcp float %a, %b, !fpmath !0
define amdgpu_kernel void @fdiv_fpmath(float addrspace(1)* %out, float %a, float %b) #1 {		define amdgpu_kernel void @fdiv_fpmath(float addrspace(1)* %out, float %a, float %b) #1 {
%no.md = fdiv float %a, %b		%no.md = fdiv float %a, %b
store volatile float %no.md, float addrspace(1)* %out		store volatile float %no.md, float addrspace(1)* %out

%md.half.ulp = fdiv float %a, %b, !fpmath !1		%md.half.ulp = fdiv float %a, %b, !fpmath !1
store volatile float %md.half.ulp, float addrspace(1)* %out		store volatile float %md.half.ulp, float addrspace(1)* %out

%md.1ulp = fdiv float %a, %b, !fpmath !2		%md.1ulp = fdiv float %a, %b, !fpmath !2
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @fdiv_fpmath_vector(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) #1 {
ret void		ret void
}		}

; CHECK-LABEL: @rcp_fdiv_fpmath_vector(		; CHECK-LABEL: @rcp_fdiv_fpmath_vector(
; CHECK: %no.md = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}		; CHECK: %no.md = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}
; CHECK: %md.half.ulp = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x, !fpmath !1		; CHECK: %md.half.ulp = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x, !fpmath !1
; CHECK: %arcp.no.md = fdiv arcp <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}		; CHECK: %arcp.no.md = fdiv arcp <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}
; CHECK: %fast.no.md = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}		; CHECK: %fast.no.md = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x{{$}}
		; CHECK: %arcp.25ulp = fdiv arcp <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x, !fpmath !0
; CHECK: extractelement <2 x float> %x		; CHECK: %fast.25ulp = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, %x, !fpmath !0
; CHECK: fdiv arcp float 1.000000e+00, %{{[0-9]+}}, !fpmath !0
; CHECK: extractelement <2 x float> %x
; CHECK: fdiv arcp float 1.000000e+00, %{{[0-9]+}}, !fpmath !0
; CHECK: store volatile <2 x float> %arcp.25ulp

; CHECK: fdiv fast float 1.000000e+00, %{{[0-9]+}}, !fpmath !0
; CHECK: fdiv fast float 1.000000e+00, %{{[0-9]+}}, !fpmath !0
; CHECK: store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out		; CHECK: store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out
define amdgpu_kernel void @rcp_fdiv_fpmath_vector(<2 x float> addrspace(1)* %out, <2 x float> %x) #1 {		define amdgpu_kernel void @rcp_fdiv_fpmath_vector(<2 x float> addrspace(1)* %out, <2 x float> %x) #1 {
%no.md = fdiv <2 x float> <float 1.0, float 1.0>, %x		%no.md = fdiv <2 x float> <float 1.0, float 1.0>, %x
store volatile <2 x float> %no.md, <2 x float> addrspace(1)* %out		store volatile <2 x float> %no.md, <2 x float> addrspace(1)* %out

%md.half.ulp = fdiv <2 x float> <float 1.0, float 1.0>, %x, !fpmath !1		%md.half.ulp = fdiv <2 x float> <float 1.0, float 1.0>, %x, !fpmath !1
store volatile <2 x float> %md.half.ulp, <2 x float> addrspace(1)* %out		store volatile <2 x float> %md.half.ulp, <2 x float> addrspace(1)* %out

Show All 11 Lines	define amdgpu_kernel void @rcp_fdiv_fpmath_vector(<2 x float> addrspace(1)* %out, <2 x float> %x) #1 {

ret void		ret void
}		}

; CHECK-LABEL: @rcp_fdiv_fpmath_vector_nonsplat(		; CHECK-LABEL: @rcp_fdiv_fpmath_vector_nonsplat(
; CHECK: %no.md = fdiv <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x		; CHECK: %no.md = fdiv <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x
; CHECK: %arcp.no.md = fdiv arcp <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x		; CHECK: %arcp.no.md = fdiv arcp <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x
; CHECK: %fast.no.md = fdiv fast <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x{{$}}		; CHECK: %fast.no.md = fdiv fast <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x{{$}}
		; CHECK: %arcp.25ulp = fdiv arcp <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x, !fpmath !0
; CHECK: %[[X0:[0-9]+]] = extractelement <2 x float> %x, i64 0		; CHECK: %fast.25ulp = fdiv fast <2 x float> <float 1.000000e+00, float 2.000000e+00>, %x, !fpmath !0
; CHECK: fdiv arcp float 1.000000e+00, %[[X0]], !fpmath !0
; CHECK: %[[X1:[0-9]+]] = extractelement <2 x float> %x, i64 1
; CHECK: fdiv arcp float 2.000000e+00, %[[X1]], !fpmath !0
; CHECK: store volatile <2 x float> %arcp.25ulp

; CHECK: %[[X0:[0-9]+]] = extractelement <2 x float> %x, i64 0
; CHECK: fdiv fast float 1.000000e+00, %[[X0]], !fpmath !0
; CHECK: %[[X1:[0-9]+]] = extractelement <2 x float> %x, i64 1
; CHECK: fdiv fast float 2.000000e+00, %[[X1]], !fpmath !0
; CHECK: store volatile <2 x float> %fast.25ulp		; CHECK: store volatile <2 x float> %fast.25ulp
define amdgpu_kernel void @rcp_fdiv_fpmath_vector_nonsplat(<2 x float> addrspace(1)* %out, <2 x float> %x) #1 {		define amdgpu_kernel void @rcp_fdiv_fpmath_vector_nonsplat(<2 x float> addrspace(1)* %out, <2 x float> %x) #1 {
%no.md = fdiv <2 x float> <float 1.0, float 2.0>, %x		%no.md = fdiv <2 x float> <float 1.0, float 2.0>, %x
store volatile <2 x float> %no.md, <2 x float> addrspace(1)* %out		store volatile <2 x float> %no.md, <2 x float> addrspace(1)* %out

%arcp.no.md = fdiv arcp <2 x float> <float 1.0, float 2.0>, %x		%arcp.no.md = fdiv arcp <2 x float> <float 1.0, float 2.0>, %x
store volatile <2 x float> %arcp.no.md, <2 x float> addrspace(1)* %out		store volatile <2 x float> %arcp.no.md, <2 x float> addrspace(1)* %out

%fast.no.md = fdiv fast <2 x float> <float 1.0, float 2.0>, %x		%fast.no.md = fdiv fast <2 x float> <float 1.0, float 2.0>, %x
store volatile <2 x float> %fast.no.md, <2 x float> addrspace(1)* %out		store volatile <2 x float> %fast.no.md, <2 x float> addrspace(1)* %out

%arcp.25ulp = fdiv arcp <2 x float> <float 1.0, float 2.0>, %x, !fpmath !0		%arcp.25ulp = fdiv arcp <2 x float> <float 1.0, float 2.0>, %x, !fpmath !0
store volatile <2 x float> %arcp.25ulp, <2 x float> addrspace(1)* %out		store volatile <2 x float> %arcp.25ulp, <2 x float> addrspace(1)* %out

%fast.25ulp = fdiv fast <2 x float> <float 1.0, float 2.0>, %x, !fpmath !0		%fast.25ulp = fdiv fast <2 x float> <float 1.0, float 2.0>, %x, !fpmath !0
store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out		store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out

ret void		ret void
}		}

; FIXME: Should be able to get fdiv for 1.0 component		; FIXME: Should be able to get fdiv for 1.0 component
; CHECK-LABEL: @rcp_fdiv_fpmath_vector_partial_constant(		; CHECK-LABEL: @rcp_fdiv_fpmath_vector_partial_constant(
; CHECK: call arcp float @llvm.amdgcn.fdiv.fast(float %{{[0-9]+}}, float %{{[0-9]+}}), !fpmath !0		; CHECK: %arcp.25ulp = fdiv arcp <2 x float> %x.insert, %y, !fpmath !0
; CHECK: call arcp float @llvm.amdgcn.fdiv.fast(float %{{[0-9]+}}, float %{{[0-9]+}}), !fpmath !0
; CHECK: store volatile <2 x float> %arcp.25ulp		; CHECK: store volatile <2 x float> %arcp.25ulp

; CHECK: call fast float @llvm.amdgcn.fdiv.fast(float %{{[0-9]+}}, float %{{[0-9]+}}), !fpmath !0		; CHECK: %fast.25ulp = fdiv fast <2 x float> %x.insert, %y, !fpmath !0
; CHECK: call fast float @llvm.amdgcn.fdiv.fast(float %{{[0-9]+}}, float %{{[0-9]+}}), !fpmath !0
; CHECK: store volatile <2 x float> %fast.25ulp		; CHECK: store volatile <2 x float> %fast.25ulp
define amdgpu_kernel void @rcp_fdiv_fpmath_vector_partial_constant(<2 x float> addrspace(1)* %out, <2 x float> %x, <2 x float> %y) #1 {		define amdgpu_kernel void @rcp_fdiv_fpmath_vector_partial_constant(<2 x float> addrspace(1)* %out, <2 x float> %x, <2 x float> %y) #1 {
%x.insert = insertelement <2 x float> %x, float 1.0, i32 0		%x.insert = insertelement <2 x float> %x, float 1.0, i32 0

%arcp.25ulp = fdiv arcp <2 x float> %x.insert, %y, !fpmath !0		%arcp.25ulp = fdiv arcp <2 x float> %x.insert, %y, !fpmath !0
store volatile <2 x float> %arcp.25ulp, <2 x float> addrspace(1)* %out		store volatile <2 x float> %arcp.25ulp, <2 x float> addrspace(1)* %out

%fast.25ulp = fdiv fast <2 x float> %x.insert, %y, !fpmath !0		%fast.25ulp = fdiv fast <2 x float> %x.insert, %y, !fpmath !0
store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out		store volatile <2 x float> %fast.25ulp, <2 x float> addrspace(1)* %out

ret void		ret void
}		}

; CHECK-LABEL: @fdiv_fpmath_f32_denormals(		; CHECK-LABEL: @fdiv_fpmath_f32_denormals(
; CHECK: %no.md = fdiv float %a, %b{{$}}		; CHECK: %no.md = fdiv float %a, %b{{$}}
; CHECK: %md.half.ulp = fdiv float %a, %b, !fpmath !1		; CHECK: %md.half.ulp = fdiv float %a, %b, !fpmath !1
; CHECK: %md.1ulp = fdiv float %a, %b, !fpmath !2		; CHECK: %md.1ulp = fdiv float %a, %b, !fpmath !2
; CHECK: %md.25ulp = fdiv float %a, %b, !fpmath !0		; CHECK: %md.25ulp = fdiv float %a, %b, !fpmath !0
; CHECK: %md.3ulp = fdiv float %a, %b, !fpmath !3		; CHECK: %md.3ulp = fdiv float %a, %b, !fpmath !3
; CHECK: call fast float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0		; CHECK: %fast.md.25ulp = fdiv fast float %a, %b, !fpmath !0
; CHECK: call arcp float @llvm.amdgcn.fdiv.fast(float %a, float %b), !fpmath !0		; CHECK: %arcp.md.25ulp = fdiv arcp float %a, %b, !fpmath !0
define amdgpu_kernel void @fdiv_fpmath_f32_denormals(float addrspace(1)* %out, float %a, float %b) #2 {		define amdgpu_kernel void @fdiv_fpmath_f32_denormals(float addrspace(1)* %out, float %a, float %b) #2 {
%no.md = fdiv float %a, %b		%no.md = fdiv float %a, %b
store volatile float %no.md, float addrspace(1)* %out		store volatile float %no.md, float addrspace(1)* %out

%md.half.ulp = fdiv float %a, %b, !fpmath !1		%md.half.ulp = fdiv float %a, %b, !fpmath !1
store volatile float %md.half.ulp, float addrspace(1)* %out		store volatile float %md.half.ulp, float addrspace(1)* %out

%md.1ulp = fdiv float %a, %b, !fpmath !2		%md.1ulp = fdiv float %a, %b, !fpmath !2
Show All 30 Lines

test/CodeGen/AMDGPU/fdiv.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @fdiv_25ulp_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {			define amdgpu_kernel void @fdiv_25ulp_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {
	entry:			entry:
	%fdiv = fdiv float %a, %b, !fpmath !0			%fdiv = fdiv float %a, %b, !fpmath !0
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_fast_denormals_f32:			; FUNC-LABEL: {{^}}fdiv_fast_denormals_f32:
	; GCN: v_div_scale_f32 [[NUM_SCALE:v[0-9]+]]			; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
	; GCN-DAG: v_div_scale_f32 [[DEN_SCALE:v[0-9]+]]			; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
	; GCN-DAG: v_rcp_f32_e32 [[NUM_RCP:v[0-9]+]], [[NUM_SCALE]]			; GCN-NOT: [[RESULT]]

	; GCN-NOT: s_setreg
	; GCN: v_fma_f32 [[A:v[0-9]+]], -[[NUM_SCALE]], [[NUM_RCP]], 1.0
	; GCN: v_fma_f32 [[B:v[0-9]+]], [[A]], [[NUM_RCP]], [[NUM_RCP]]
	; GCN: v_mul_f32_e32 [[C:v[0-9]+]], [[B]], [[DEN_SCALE]]
	; GCN: v_fma_f32 [[D:v[0-9]+]], -[[NUM_SCALE]], [[C]], [[DEN_SCALE]]
	; GCN: v_fma_f32 [[E:v[0-9]+]], [[D]], [[B]], [[C]]
	; GCN: v_fma_f32 [[F:v[0-9]+]], -[[NUM_SCALE]], [[E]], [[DEN_SCALE]]
	; GCN-NOT: s_setreg			; GCN-NOT: s_setreg
	; GCN: v_div_fmas_f32 [[FMAS:v[0-9]+]], [[F]], [[B]], [[E]]			; GCN: buffer_store_dword [[RESULT]]
	; GCN: v_div_fixup_f32 v{{[0-9]+}}, [[FMAS]],
	define amdgpu_kernel void @fdiv_fast_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {			define amdgpu_kernel void @fdiv_fast_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {
	entry:			entry:
	%fdiv = fdiv fast float %a, %b			%fdiv = fdiv fast float %a, %b
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_f32_fast_math:			; FUNC-LABEL: {{^}}fdiv_f32_fast_math:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS

	; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}			; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
	; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]			; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
	; GCN-NOT: [[RESULT]]			; GCN-NOT: [[RESULT]]
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	define amdgpu_kernel void @fdiv_f32_fast_math(float addrspace(1)* %out, float %a, float %b) #0 {			define amdgpu_kernel void @fdiv_f32_fast_math(float addrspace(1)* %out, float %a, float %b) #0 {
	entry:			entry:
	%fdiv = fdiv fast float %a, %b			%fdiv = fdiv fast float %a, %b
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}fdiv_ulp25_f32_fast_math:
				; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W
				; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS

				; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
				; GCN-NOT: [[RESULT]]
				; GCN: buffer_store_dword [[RESULT]]
				define amdgpu_kernel void @fdiv_ulp25_f32_fast_math(float addrspace(1)* %out, float %a, float %b) #0 {
				entry:
				%fdiv = fdiv fast float %a, %b, !fpmath !0
				store float %fdiv, float addrspace(1)* %out
				ret void
				}

	; FUNC-LABEL: {{^}}fdiv_f32_arcp_math:			; FUNC-LABEL: {{^}}fdiv_f32_arcp_math:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS

	; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}			; GCN: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
	; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]			; GCN: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
	; GCN-NOT: [[RESULT]]			; GCN-NOT: [[RESULT]]
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	Show All 17 Lines
	define amdgpu_kernel void @fdiv_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) #0 {			define amdgpu_kernel void @fdiv_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) #0 {
	entry:			entry:
	%fdiv = fdiv <2 x float> %a, %b			%fdiv = fdiv <2 x float> %a, %b
	store <2 x float> %fdiv, <2 x float> addrspace(1)* %out			store <2 x float> %fdiv, <2 x float> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_ulp25_v2f32:			; FUNC-LABEL: {{^}}fdiv_ulp25_v2f32:
	; GCN: v_cmp_gt_f32			; GCN: v_rcp_f32
	; GCN: v_cmp_gt_f32			; GCN: v_rcp_f32
				; GCN-NOT: v_cmp_gt_f32
	define amdgpu_kernel void @fdiv_ulp25_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) #0 {			define amdgpu_kernel void @fdiv_ulp25_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) #0 {
	entry:			entry:
	%fdiv = fdiv arcp <2 x float> %a, %b, !fpmath !0			%fdiv = fdiv arcp <2 x float> %a, %b, !fpmath !0
	store <2 x float> %fdiv, <2 x float> addrspace(1)* %out			store <2 x float> %fdiv, <2 x float> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_v2f32_fast_math:			; FUNC-LABEL: {{^}}fdiv_v2f32_fast_math:
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines